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The price of progress 


Anew uranium enrichment technique approved by the US Nuclear Regulatory Commission could 
have animpact on nuclear proliferation. This should have been taken into account. 


progress. But there are exceptions: progress is not always better. 

Last week, the US Nuclear Regulatory Commission (NRC) in 
Rockville, Maryland, announced that it had issued a licence for a new 
technology to enrich uranium using lasers — a process that is poten- 
tially cheaper and better than current enrichment methods, but also 
politically fraught and possibly dangerous (see Nature http://doi.org/ 
jfr; 2012). The decision was unfortunate. The NRC should introduce 
rules to ensure that its future moves are better informed. 

Nuclear power and nuclear weapons both require uranium that has 
been enriched to contain higher levels of uranium-235 than occurs 
naturally. The 1970 nuclear non-proliferation treaty gives countries 
the right to possess enrichment systems, but most are subject to strict 
controls. Where facilities are declared as peaceful, they are subject 
to inspection by the International Atomic Energy Agency, which 
ensures that material is not diverted for military or criminal purposes. 
National intelligence agencies, too, keep a careful eye on the activities 
of friends and rivals worldwide. 

Current techniques to separate uranium-235 from its heavier, non- 
fissile sibling uranium-238 are hard to hide. They typically involve 
enormous gaseous-diffusion plants, or smaller, energy-intensive 
centrifuge facilities that spin the two isotopes apart in high-speed 
machines. Both require lots of space and electricity, which means 
that such facilities are easy to spot with, say, satellite reconnaissance. 

Lasers are more discreet. When finely tuned, they can ionize ura- 
nium-235 that can then be collected on a negatively charged plate. The 
new technique will require considerably less space and electricity than 
centrifuge or diffusion plants. 

GE Hitachi, the multinational company pursuing laser enrich- 
ment, describes it as a “game-changing technology’, and those who 
fear nuclear proliferation agree. Understanding where enrichment 
facilities are and how they operate is crucial to maintaining the trea- 
ties and agreements that limit the spread of nuclear technology. When 
undeclared enrichment facilities are uncovered, as happened in Iran 
in 2009, it can spark an international crisis. Even more worrying is the 
possibility that some facilities can simply never be detected. Mistrust 
could lead to regional arms races and even open conflicts. 

When the NRC approved GE Hitachi's application for a laser enrich- 
ment plant, it considered physical security, such as the height of the 
fence around the plant. But the commission did not explicitly address 
the existential threat that a plant might pose to the non-proliferation 
regime, nor did it concern itself with the message that the pursuit of 
this technology might send to other nuclear nations. 

The American Physical Society (APS), based in College Park, Mar- 
yland, would like to change that. Backed by environment groups and 
scientists, the society has proposed that the NRC change its regula- 
tions to require future applicants to submit a ‘proliferation review. 
This would force the commission and corporations to consider the 


S cientists and scientific journals are naturally inclined towards 


proliferation consequences of their proposed facilities. It would also 
provide a public focus for debate, much as environmental-impact 
statements have allowed activists and citizens to query the environ- 
mental aspects of construction. 

Could sucha review have changed the outcome of the NRC's licence 
decision? Perhaps. From a commercial standpoint, the benefit of laser 
enrichment is clear, but from a broader societal perspective, it is ques- 
tionable. The cost of fuel for nuclear reactors is dwarfed by the cost 

of construction, and this has stalled a nuclear 


“To consider renaissance in the West. A 2010 analysis 
prolifer ation showed that the savings to households would 
might seem be less than US$10 per year from laser enrich- 
beyondtheremit ment —a saving that many homeowners 
of aregulator, might be unwilling to accept if presented with 


but it is not.” the risks the technology poses (see FE. Slakey 
and L. R. Cohen Nature 464, 32-33; 2010). 
The APS’s proposed rule change will be voted on by the five NRC 
commissioners in November. Nature urges the commissioners (who 
were not directly involved in approving the laser enrichment system) 
to approve the change. To consider something as vague as prolifera- 
tion might seem beyond the remit of a regulator, but it is not. Under 
the Atomic Energy Act of 1954, the NRC is charged with considering 
whether its licences would “be inimical to the common defense and 
security” of the US public. Nuclear technologies that are potentially 
destabilizing pose a threat to the security of citizens everywhere, and 
so the commission could rightly consider it under an altered licensing 
process. In the case of laser enrichment, as with other technologies, 
progress should be judged on more than the technical merits alone. m 


Power cuts 


China’s slumping renewable -energy industry 
should be learnt from, not dismissed. 


On 3 August, the head of a Chinese solar-energy company leapt 

to his death because his firm couldn't pay offa loan. Last month, 
LDK Solar in Xinyu City cut its earnings forecast for the second con- 
secutive time, and watched its stock drop to one-quarter of what it 
was a year ago after it defaulted on payments. Suntech Power in Wuxi, 
the world’s biggest producer of solar panels, announced plans to slash 
production and was forced to accept aloan worth US$32 million from 
the local government to stay afloat. China's wind-turbine manufactur- 
ers are also being forced to watch sales drop in an overcapacity market. 


r These are testing times for the renewable-energy boom in China. 
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China — which produces more solar panels and wind turbines than 
any other country — likes to promote its renewable-energy industry 
as a glowing symbol of a technology-driven, green future. But, over 
the past few years, a glut in the international market, a drop in prices 
and import tariffs introduced by countries including the United States 
have left companies with over-supply and debt. 

Observers have jumped on the downturn with criticisms of how the 
Chinese government has supported and protected renewable-energy 
companies. A widely quoted piece in the Shanghai-based newspaper 
First Financial Daily claimed that the slump signalled the “imminent 
collapse” of the country’s photovoltaics industry. A gloating editorial 
in The Wall Street Journal in August used events in China to warn 
the US government against support for its own renewables industry. 

Strong stuff, but perhaps not unexpected. It is clear that some com- 
panies overshot the market, and there are some grounds for claims 
that China has engaged in unfair trade practices. When the coun- 
try entered renewable energy, it instituted protective measures such 
as requiring that 70% of turbines sold in China be produced with 
domestically made parts, but those measures have been removed. 
Many renewables firms have failed and can no longer be propped up 
by supportive local governments. 

However, it is foolish to decide that China’s renewables industry has 
been — or will be — a failure on the basis of its current problems. The 
scale of some of these problems has been overstated. The Wall Street 
Journal, for instance, blamed decreased revenue at power companies 
on underperformance at wind farms, when in fact it had more to do 
with the increasing price of coal. 

By whatever means it has been achieved — and whatever else might 
be said about it — China's investment in renewables has been a remark- 
able project. Just seven years after a renewable-energy law threw gov- 
ernment support behind the industry, China went from having almost 


no stake in the international market to leading the manufacture of solar 
photovoltaics and wind turbines, in very competitive industries. China 
has developed know-how and manufacturing bases: it has the engi- 
neers, and it is ready to lead the industry into the future. 

The country’s targets will carry the industry forward. Some 62 giga- 
watts of wind power are currently installed — more than in any other 
country — and the government has set a target of 200 gigawatts by 
2020. An even bigger difference will come 


“It is foolish when the domestic demand for solar energy 
to decide picks up, which it surely will. Until now, 
that China’s nearly all of China’s photovoltaic units have 
renewables been exported; domestic use has increased 
industry has but remains at a relatively low 3.1 gigawatts. 


been a failure.” The Chinese government is establish- 
ing policies that will encourage this. A new 
renewable-portfolio standard, to be implemented by the end of this 
year, will force power companies to generate a mandatory proportion 
of their energy from renewables, with penalties for those that do not. 
An upgrade of long-distance power lines, to transfer energy from wind 
farms or megasolar plants in the west of China to the energy-hungry 
east, was approved last year. Consumers will be forced to share the 
burden when a surcharge on electricity from renewables kicks in. 

Chinese renewable energy has certainly hit a low point. Many of 
the 80 or so companies that produce wind turbines will probably have 
to close. But what The Wall Street Journal called a waste of time and 
money can be seen as healthy competition in an immature market. As 
China’s renewables industry reorganizes and restructures itself, there 
may be reasons not to emulate it. But the government, for reasons 
relating to pollution, climate change and energy security, is firmly 
behind the industry. And it has built itselfa solid platform from which 
to push on. = 


Life sciences 


Survivors of the 2010 University of Alabama 
shooting chose not to push for the death penalty. 


cold blood and grievously wounded two others, will spend 

the rest of her life behind bars after an Alabama court sen- 
tenced her last week. The Harvard-trained assistant professor had 
been denied tenure at the University of Alabama in Huntsville. In 
February 2010, months after her appeal against the decision failed, 
Bishop, a mother of four, pulled out a 9-millimetre pistol during a 
faculty meeting in a tiny conference room. 

Without saying a word, Bishop methodically shot down fellow biol- 
ogists Maria Ragland Davis, Adriel Johnson and department chairman 
Gopi Podilla. A bullet to the head of colleague Joseph Leahy left him, 
after many months of recovery, blind in his right eye and partially 
sighted in his left. Staff assistant Stephanie Monticciolo, the depart- 
ment’s mother hen, had the teeth on one side of her mouth knocked 
out. She sustained shattered sinuses and a broken jaw, and was blinded 
in one eye. 

“Many, many things are better than I could have ever hoped,” 
Monticciolo’s adult daughter Michele posted on a blog 18 months 
later. “Some things, however, will never be the same.” 

The same is true in the biology department at Huntsville, two 
and a half years after the shooting. Yet signs of a determined recov- 
ery abound. Ten new graduate students enrolled in August. The 
department has made university biochemist Debra Moriarity its 
chairwoman, and has hired two new faculty members. Leahy, a micro- 
biologist, is back teaching full-time as of this term. And last week, 


A my Bishop, the biologist who murdered three colleagues in 
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structural biologists on the faculty hosted an international conference 
on the crystallization of biological macromolecules, attended by more 
than 200 scientists. 

In the state of Alabama, there are only two possible sentences for 
capital murder, with which Bishop, now 47, was charged: life in prison 
with no possibility of parole, or death by lethal injection or electrocu- 
tion. Prosecutors said almost from the outset that they would seek 
the death penalty. 

At first, Bishop pleaded not guilty “by reason of mental disease or 
defect”. Then, weeks before the trial was set to open last month, a 
heartening tale of human generosity began to unfold. It emerged that 
the spouse of one of the murdered biologists had written to judge Alan 
Mann, who would have the final say over the sentence if Bishop was 
found guilty. The letter-writer noted that his or her family had suffered 
greatly, but added that they could see no benefit in the loss of another 
life. The writer asked Mann to spare Bishop the death penalty. 

The letter prompted Bishop to offer, through her lawyers, to change 
her plea to guilty if the prosecutors would drop their pursuit of the 
death penalty. The prosecutors sounded out the other survivors: the 
nine who had been in the conference room at the time of the shooting, 
and the families of the dead. None wanted the death penalty. A deal 
was reached and Bishop changed her plea to guilty. 

Robert Broussard, the lead prosecutor on the case, told Nature 
this week that the common sentiment among the survivors “abso- 
lutely” swayed him not to seek the ultimate punishment. And so, 
on 24 September, after a brief trial, Bishop was sentenced to life 
behind bars. 

In 25 years of prosecuting murders, Broussard said, he has never 
seen such equanimity in so many people 
affected by a violent crime. Those who will 
spend a lifetime bearing the wounds that Amy 
Bishop inflicted, inside and out, reached deep 
and found mercy. = 
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Commission is poised to authorize, for the first time in the West- 

ern world, the commercialization of a gene-therapy product. 
Called Glybera (alipogene tiparvovec), it is designed to treat a rare 
genetic defect involved in fat metabolism. 

Success has been a long time coming. Gene therapy was first admin- 
istered more than 20 years ago, to a child who had a rare disorder of the 
immune system called adenosine deaminase (ADA) deficiency. Since 
then, it has struggled to find its place in medicine amida roller coaster of 
successes and setbacks, hype and scepticism that has little precedent in 
modern times. Although the approval of Glybera is a positive move, it is 
unlikely to herald a newage of gene therapies — not without significant 
changes to the system. It is no coincidence that no gene therapy has yet 
been approved in the United States and that no other gene-therapy prod- 
uct is being considered by regulators in Europe. 

Here is why. The design, development and 
manufacture of products such as Glybera — a 
virus engineered to carry a correct copy of the 
defective gene — is complex and done mostly in 
academic centres. Yet legislation introduced in 
the past decade in Europe and the United States 
demands that these products be produced under 
the same rules that cover conventional drugs, in 
establishments operated with industry-like stand- 
ards and certified by government agencies. 

This is a formidable challenge for academic 
centres, which tend to lack the necessary human 
and financial resources. So why is the development 
of gene therapy focused there, and not in industry, 
which seems better suited? 

The first reason is the financial uncertainty generated by the com- 
plex, confused and poorly harmonized regulatory environment — as 
the history of Glybera shows. At first, the application for its authoriza- 
tion received a negative opinion from two committees at the European 
Medicines Agency (EMA): the Committee for Advanced Therapies 
(CAT) and the Committee for Human Medicinal Products for Human 
Use (CHMP). Only when another body, the Standing Committee of 
the European Commission, asked the EMA to reconsider the applica- 
tion in a restricted indication did the CHMP eventually recommend 
approval under “exceptional circumstances’, requiring post-marketing 
studies and the set-up of a restricted-access programme. The Dutch firm 
Amsterdam Molecular Therapeutics, the inventor of Glybera, did not 
survive the process, and became known as uniQure after refinancing. 

Lack of resources is a second reason. For many years, the drug indus- 
try stayed away from gene therapy, perceiving it 


ik gene therapy finally becoming a reality? The European 


as a dangerous technology of dubious efficacy NATURE.COM 
that was too complex to develop and targetedtoo _Discuss this article 
small a market. online at: 

There are some positive signs, because thislast _go.tiatire.com/wuwlfk 


THE INDUSTRY NOW 


RECOGNIZES 
THAT RARE DISEASES 
AND ORPHAN-DRUG 
LEGISLATION PROVIDE 


ATTRACTIVE 


OPPORTUNITIES. 


’ Gene therapies need new 
development models 


As with other medicines, the approval of gene therapies should hinge ona 
risk—benefit analysis for the patient, argues Fulvio Mavilio. 


perception, at least, is changing: the industry now recognizes that rare 
diseases and orphan-drug legislation provide attractive opportunities. 
Some recombinant proteins and monoclonal antibodies originally 
developed as orphan drugs have been repurposed for larger indications. 

An example of how academia and industry could cooperate comes 
from the recent alliance between the drug giant GlaxoSmithKline (GSK) 
in London, and the charity-funded San Rafaelle Telethon Institute 
for Gene Therapy (TIGET) in Milan, Italy. GSK gained an exclusive 
licence to develop and commercialize the ADA treatment, and will co- 
develop with TIGET gene therapies for six more genetic diseases. The 
contribution of public or charity-funded organizations in early devel- 
opment phases lowers the cost and risk of investing in diseases with a 
tiny market, and gives the industry access to technologies that can be 
expanded to more profitable applications, thereby repaying the invest- 
ment and allowing resources to be fed back into 
rare diseases. Unfortunately, promising therapies 
for hundreds of orphan diseases are unlikely to 
attract similar industrial interest. 

So, how do we ensure that scientists will con- 
tinue to develop such treatments? Should they all 
turn to the ‘hospital exemption; which permits 
experimental therapies to be manufactured and 
used under the responsibility of a physician with- 
out regulatory supervision? 

That should not become standard practice. 
Governments, funding agencies, scientists and 
patients’ associations must together come up 
with new models. Public funds could be used to 
pay for centralized manufacturing facilities or to 
subsidize enterprises with the necessary exper- 
tise to get involved, as is done for vaccines. And regulators should look 
again at product definition and the pathway to market. 

The complex combination that forms the basis of the ADA therapy 
makes it somewhere between a biotherapeutic and a transplantable 
organ, so it hardly meets the definition of a ‘medicinal product’ and 
should therefore be regulated differently. Moreover, the amount of 
preclinical data and post-treatment monitoring currently required to 
authorize a treatment is hardly justified when preclinical models are 
uninformative and patients have no therapeutic alternatives. 

The major factor in deciding whether to authorize an experimental 
treatment should be a risk—benefit analysis for the patients. Applying 
a different standard to gene therapy is unfair, slows down its develop- 
ment, discourages investment and ultimately denies people the right 
to have timely access to possible cures. = 


Fulvio Mavilio is scientific director of Genethon in Evry, France, and a 
professor of molecular biology at the University of Modena and Reggio 
Emilia in Italy. 

e-mail: fmavilio@genethon.fr 
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RESEARCH HIGHLIGHTS 


Missing Galactic 
baryons spotted 


More than half of the 
expected number of baryons 
— subatomic particles 

that make up our everyday 
world, such as protons and 
neutrons — in the Milky 

Way are unaccounted for, but 
Anjali Gupta of Ohio State 
University in Columbus and 
her colleagues seem to have 
spotted the missing matter. 
They analysed X-ray satellite 
data and found evidence ofa 
hot gas halo around the Galaxy 
extending for 100 kiloparsecs. 
The authors estimate the mass 
of the halo to be between 

10 billion and 60 billion times 
the mass of the Sun. The cloud 
could account for the missing 
particles, they conclude. 
Astrophys. J. Lett. 756, L8 (2012) 


Mother’s stress 
slows learning 


Female sticklebacks that are 
confronted by predators while 
producing eggs generate 
offspring with impaired 
learning abilities. 

Katie McGhee of the 
University of Illinois at 
Urbana-Champaign and 
her colleagues used a fake 
predator to repeatedly chase 
one set of female threespined 
sticklebacks (Gasterosteus 
aculeatus; pictured), while 
leaving another set to produce 
their eggs in peace. Adult 
offspring from both groups 


Selections from the 
scientific literature 


Feeding habits of the vampire squid 


Drifting in the deep ocean, the vampire 
squid (Vampyroteuthis infernalis; pictured) 
has features of both octopuses and squid. 
Researchers have now worked out what and 
how this mysterious creature eats. 

V. infernalis — which is related to octopuses 
and squid — has eight arms and, instead of the 
feeding tentacles used by squid to capture prey, 
has two long, retractile filaments. Hendrik 
Hoving and Bruce Robison at the Monterey Bay 
Aquarium Research Institute in Moss Landing, 
California, studied the feeding behaviour of 


initially showed similar 
performances in a task in 
which the animals learned to 
associate the colour blue with 
a food reward. But after five 
days of the task, the offspring 
of mothers exposed to 
predators took twice as long to 
find the food as did the control 
group. 

At least in these fish, 
maternal stress can have 
long-lasting effects on the 
learning ability of offspring, 
the authors say. 

Biol. Lett. http://dx.doi. 
org/10.1098/rsbl.2012.0685 
(2012) 
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V. infernalis using deep-sea video recordings, 
lab feeding studies and morphological 
examinations. They conclude that the filaments 
help the animals to capture food, which 


includes zooplankton, crustacean remains and 


even faeces. 


The filaments are homologous to the arms of 
octopuses and other cephalopods, although the 
creatures’ feeding habits are very different, the 


authors suggest. 
Proc. R. Soc. B http://dx.doi.org/10.1098/ 
rspb.2012.1357 (2012) 
used functional magnetic 
. resonance imaging to monitor 
Responses vary in cortical responsiveness as they 
autistic brains stimulated the participants’ 


Autism may emerge from 

a general unreliability of 
neuronal responses in the 
brain’s cortex, rather than from 
a deficiency in one particular 
brain area or circuit. 

Ilan Dinstein at Carnegie 
Mellon University in 
Pittsburgh, Pennsylvania, 
and his colleagues studied 
14 people with autism 
and 14 people without the 
disorder. The researchers 
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sight, hearing and touch in 
dozens of trials. 

Responses in the visual, 
auditory and somatosensory 
areas of the cortex all varied 
much more between trials for 
the volunteers with autism 
than for the controls. The 
authors propose that this 
may reflect inappropriate 
development of neuronal 
connections, or synapses, in 
the autistic brain. 

Neuron 75, 981-991 (2012) 
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ATHEROSCLEROSIS 


Fatty plaque link 
to inflammation 


A major constituent of the 
fatty arterial plaques seen in 
heart disease may dampen 
the inflammation that drives 
the disease. A cholesterol- 
precursor molecule may be a 
mediator of this suppression. 

Atherosclerosis progresses 
as cholesterol-filled immune 
cells called macrophage foam 
cells accumulate in arterial 
walls. Christopher Glass at 
the University of California, 
San Diego, and his colleagues 
found that in mice fed a 
high-fat, high-cholesterol 
diet, these cells were linked to 
suppression of inflammation. 
Within the foam cells, 
desmosterol, an intermediate 
in the cholesterol biosynthesis 
pathway, accumulated to 
significantly higher levels 
than did other intermediates. 
Moreover, desmosterol was 
also abundant in human 
atherosclerotic lesions. In the 
mice, desmosterol seemed to 
suppress genes that promote 
inflammation during foam-cell 
formation. 

Synthetic formulations of 
desmosterol might intervene 
in cardiovascular disease, the 
authors suggest. 

Cell 151, 138-152 (2012) 


Homing inona 
black hole’s jets 


Astronomers have for the first 
time located the launch site 
ofa giant, high-speed jet of 


charged particles believed to 
originate from a supermassive 
black hole. 

Black holes suck up 
large amounts of gas and 
dust, which swirl around 
the black hole and are 
thought to feed these jets 
(simulated image pictured). 
Sheperd Doeleman of the 
Massachusetts Institute 
of Technology’s Haystack 
Observatory in Westford and 
his colleagues linked four 
radio dishes in California, 
Arizona and Hawaii to make 
a single large telescope. Using 
this, they examined the jets’ 
source region, which lies just 
outside the 6.2-billion-solar- 
mass black hole at the centre 
of the galaxy M87. 

The launch site was 
small, suggesting two 
key properties about the 
gravitational giant: the black 
hole spins, and it feeds ona 
surrounding disk of material 
that orbits it in the same 
direction as its spin. 
Science http://dx.doi. 
org/10.1126/science.1224768 
(2012) 
For a longer story on this research, 
see go.nature.com/tqrwgb 


CELL BIOLOGY 


Away to catch 
dividing cells 


A genetic tool can help 
biologists to pinpoint rare, 
replicating cells in tissue 
samples from adult mice. 
Finding and studying 
dividing cells is important 
in learning about growth, 
healing and other key 
processes, but researchers 
have been hindered by the 
difficulty of isolating these 
cells alive. To overcome 
this problem, Amir Eden 
at the Hebrew University 
of Jerusalem and his 
colleagues created a mouse 
strain in which the gene for 
a fluorescent protein was 
fused to a gene that is active 
when cells divide. After 
dissecting livers from these 
mice, the researchers could 
sort dividing from non- 
dividing cells and compare 
their gene expression. Genes 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


Bendable battery yields flexible LED 


€3 HIGHLY READ 


Thanks to an energy-dense flexible 
lithium-ion battery, researchers have 


built a thin, bendable light source using 
an organic light-emitting diode (LED). 
The device could one day be incorporated into rollable or 


implantable electronics. 


Lithium-ion batteries are among the best candidates for 
flexible power sources, but their electrodes could previously 
be made with only a few, low-performing materials. Now, 
Keon Jae Lee at the Korea Advanced Institute of Science 
and Technology in Daejeon and his colleagues have created 
a bendable battery by using existing methods to apply 
lithium-based electrodes onto a brittle mica surface at high 
temperature and then peel off the mica substrate. They then 
used a technique that they had devised to transfer the battery 
onto a flexible polymer. The method allows the incorporation 
into a flexible battery of almost all the high-performance 
materials that are used in rigid batteries. 


Nano Lett. 12, 4810-4816 (2012) 


for liver specialization, or 
differentiation, were less 
active in replicating cells. 
This fluorescent marker 
could help scientists to isolate 
and study dividing cells in 
multiple tissues and biological 
assays. 
Dev. Cell http:// 
dx.doi.org/10.1016/j. 
devcel.2012.08.009 (2012) 


Zebrafish find 
light without eyes 


Eyeless zebrafish larvae may 
still find their way out of 
darkness by the activation of 
light-sensitive neurons deep 
inside the brain. 

Harold Burgess at the 
National Institute of 
Child Health and Human 
Development in Bethesda, 
Maryland, and his team 
found that the transparent 
zebrafish larvae (Danio 
rerio; pictured, with certain 
brain cells in green) swim 
gradually towards the 
illuminated areas of their 
tank, even after their eyes 
have been removed. This 
behaviour suggests the 


presence of light-responsive 
neurons outside the 
conventional visual organs. 
Engineered eyeless fish that 
express less Opn4a — a 
light-sensitive molecule 

in the brain — responded 
poorly to light, whereas fish 
that produced more Opn4a 
performed better. 

The researchers conclude 
that neurons expressing 
Opn4a in the preoptic area 
of the brain may support 
simple, and perhaps 
primordial, light-seeking 
behaviours. 

Curr. Biol. http://dx.doi. 
org/10.1016/j.cub.2012.08.016 
(2012) 
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SEVEN DAYS sescnins 


POLICY 


French science safe 


France’s research and 
higher-education ministry 
received a 2.2% boost in the 
government's 2013 austerity 
budget on 28 September, 
escaping the cuts imposed on 
many other ministries. The 
increase will pay for 1,000 
new university posts. Funding 
for research grants will also 
rise by 1.2% to €7.86 billion 
(US$10.1 billion) — a cut in 
real terms if inflation averages 
above 1.75% as expected. See 
go.nature.com/7ialrr for more. 


Campuses struggle 
State funding for US public 
research universities has been 
insufficient to keep pace with 
rising student enrolment over 
the past decade, according toa 
report by the National Science 
Board. Between 2002 and 
2010, state funding per student 
dropped in 43 of the 50 states, 
with cuts as high as 48%. The 
board warns that the trend 
threatens to severely hinder 
research and development at 
the nation’s 101 major public 
institutions, which train the 
majority of US scientists and 
engineers. 


Spanish pain 

Spain's research and 
development budget will be 
cut for a fourth year if the draft 
budget for 2013 is approved. 
Presented to parliament on 

29 September, the draft budget 
reduces science spending to 
€5.9 billion (US$7.6 billion) — 
€461 million or 7.2% less than 
in 2012. See go.nature.com/ 
msiepf for more. 


US cuts loom 


Federal research and 
development funds in the 
United States could be slashed 
by US$57.5 billion over the 
next five years under an 
across-the-board budget cut 
that is due to come into effect 


Ancient stream on Mars 


NASA’ Mars rover Curiosity has discovered evidence that 
water flowed at the bottom of Gale Crater billions of years ago. 
Although scientists have found many hints of water on Mars 
before this discovery, rounded gravel pieces photographed by 
Curiosity (left) are similar to rocky outcrops on Earth (right). 
These suggest that the stream coursed at speeds of around one 
metre per second and was at least ankle-deep. See go.nature. 


com/fyogfs for more. 


on 2 January 2013, according 
to an analysis released on 

27 September. The analysis, by 
the American Association for 
the Advancement of Science 
based in Washington DC, 
predicts that the National 
Institutes of Health stands 

to lose US$11.3 billion, or 
7.6%, from its R&D budget 
as a result of the ‘sequester’ 
unless Congress can agree on 
an alternative budget plan to 
lower the federal deficit. See 


go.nature.com/jjrly8 for more. 


Nuclear concerns 


Plans to build a controversial 
type of uranium enrichment 
plant were given the green 
light by the US Nuclear 
Regulatory Commission on 
25 September. Critics fear that 
the technology to be used at 
the facility in Wilmington, 
North Carolina, proposed 
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by General Electric—Hitachi 
Global Laser Enrichment, 
could lead to the proliferation 
of nuclear weapons. See page 5 
for more. 


Fusion failure 

The National Ignition Facility, 
a US$3.5-billion laser fusion 
facility at Lawrence Livermore 
National Laboratory in 
Livermore, California, did 
not meet a 30 September 
congressional deadline for 
‘ignition (see Nature 483, 
133-134; 2012). That is the 
point at which the energy 
produced by the fusion 
process surpasses that put into 
the laser shot to trigger fusion. 


Mars mission plan 


NASA's Mars programme 
should retain a focus on 
searching for evidence of past 
life, rather than on looking for 
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life today, according to a study 
released on 25 September. 
The report, by NASA’ Mars 
Program Planning Group, 
comes in the wake of the 
agency’s withdrawal from 
international missions in 2016 
and 2018. NASA’ next Mars 
mission is likely to be either an 
orbiter in 2018 or a rover 

in 2020, the study suggests. 
See go.nature.com/wldhnw 
for more. 


Saved species list 


At its conference in Jeju, 
South Korea, earlier this 
month, the International 
Union for the Conservation 
of Nature (IUCN) agreed to 
produce a green list’ of fully 
conserved species. The news 
was announced last week by 
the Wildlife Conservation 
Society, based in New York, 
who co-sponsored the 
motion. The list will serve as 
a counterpoint to the IUCN’s 
‘red list’ of endangered species. 
See go.nature.com/tqhvdb 
for more. 


| BUSINESS 
Trials monitor 


In an announcement on 

26 September, the US 
Department of Health and 
Human Services charged the 
Food and Drug Administration 
with monitoring whether 

data for clinical trials of drugs 
and medical devices are 
incomplete, false or misleading, 
and notifying the companies 
responsible. It is unclear, 
however, what the penalties 
will be for companies that fail 
to comply. See go.nature.com/ 
tv24eb for more. 


| RESEARCH 
Coronavirus cases 


Scientists in the Netherlands 
have deposited the full 
sequence of a new coronavirus 
that is thought to have caused 
a respiratory illness ina person 


MSSS AND PSI/JPL-CALTECH/NASA 


s from Saudi Arabia who died 
[S) . . . 
2 from the infection in June 
= (GenBank accession number 
JX869059; see go.nature. 
com/g343qd). The sequence 
matches the partial sequence 
ofa virus from another patient 
with similar symptoms who 
3 was transferred from intensive 
= i % 
care in Qatar to London in 
early September. See page 20 
for more. 


Ape habitat shrinks 


The first continent-wide 
survey of African great-ape 
habitat has reported a massive 
decline between 1995 and 
2010. A paper published on 
23 September (J. Junker et al. 
Divers. Distrib. http://doi.org/ 
jfv; 2012) shows that Cross 
River gorillas (Gorilla gorilla 
diehli) have experienced 

a 59% loss in their habitat 
during this period; bonobos 
(Pan paniscus) 29%; and 
central chimpanzees (Pan 
troglodytes troglodytes) 17%. 
See go.nature.com/brilxf 

for more. 
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Bas-Congo virus 

A virus that causes fever and 
bleeding and that killed two 
teenagers and infected a nurse 
in the Democratic Republic 
of Congo in 2009 has been 
identified as a new form of 
rhabdovirus (pictured). 
According to a paper 
published on 27 September, 
Bas-Congo virus is from the 
same family as rabies, and 


Pharmaceutical companies 
paid a record amount in 


SOURCE: PUBLIC CITIZEN 


A report by Public Citizen, 
a consumer group based in 


TREND WATCH | 


malpractice fines in the United 
States in the first half of 2012. 


Washington DC, found that by 
18 July, US$5 billion in federal 
settlements and $1.6 billion 

in state settlements had been 
made for activities such as illegal 
marketing and overcharging 

of government medical 
programmes. Settlements with 
individual states have steadily 
increased since 2009. 


antibody tests suggest that 
it could be transmissible 
between humans (G. Grard 
et al. PLoS Pathog. 8, 
e1002924; 2012). 


Comet discovery 
The discovery of comet 
C/2012 $1 (ISON) 

was announced by the 
International Astronomical 
Union's Central Bureau for 
Astronomical Telegrams in 
Cambridge, Massachusetts, 
on 24 September. The comet is 
named after the International 
Scientific Optical Network, 
which includes the Russia- 
based telescope that spotted 
it. Some astronomers suggest 
that the comet might be bright 
enough to be seen during 

the day at around the time 

it brushes past the Sun in 
November 2013; however, 
such early predictions often 
prove unreliable. 


Element 113 


Researchers in Japan have 
made their third atom of 
element 113 —a feat that 


US PHARMA FINES 


could give them the right 

to name it. Russian and US 
researchers may already have 
created atoms of the element 
in earlier experiments, but 
their attempts failed to satisfy 
the body of experts responsible 
for deciding on the matter. 
The experts, drawn from the 
International Union of Pure 
and Applied Chemistry and 
the International Union of 
Pure and Applied Physics, 
have yet to report on the 
Japanese claim. See go.nature. 
com/cxj9 1x for more. 


Ecologist dies 


Pioneering environmentalist, 
ecologist and one time 

US presidential candidate 
Barry Commoner died 

on 30 September, aged 95. 
The scientist rallied against 
poverty, pollution and nuclear 
testing and had a key role in 
the first Earth Day in 1970. 
He ran unsuccessfully for 
president in 1980, garnering 
234,000 votes. 


Chemist accused 


A Russian chemist accused 

of aiding attempted drug 
trafficking is reportedly facing 
fresh charges. Olga Zelenina, 
a narcotics expert at the Penza 
Agricultural Institute, was 
released on 25 September 
from pre-trial detention. But 
according to Russian news 
reports, she has since been 


Penalties paid out by pharmaceutical companies in the United 
States have already reached record levels this year. 
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SEVEN DAYS | THIS WEEK | 


COMING UP 


7 OCTOBER 

Private spaceflight 
firm SpaceX, based in 
Hawthorne, California, 
is hoping to launch its 
first NASA-contracted 
cargo-resupply mission 
to the International 
Space Station. 
WWW.spacex.com 


8-10 OCTOBER 

The winners of the 

2012 Nobel prizes in 
physiology or medicine, 
physics and chemistry 
are announced in 
Stockholm. 
www.nobelprize.org 


accused by the Russian Federal 
Drug Control Service of 
having produced her report 
on the amount of opiates in 

a seized shipment of Spanish 
opium poppy seeds without 
the necessary permission from 
her institute — even though 
the report bears the signature 
of her institute director, 
Alexander Smirnov. Smirnov 
has not responded to Nature’s 
request for clarification. 


Polar expert cleared 


Charles Monnett, a researcher 
at the US interior department’s 
Bureau of Ocean Energy 
Management, headquartered 
in Washington DC, has 

been cleared of scientific 
misconduct after a 
government investigation. 
Monnett had been accused of 
publishing false data in a paper 
that suggested four drowned 
polar bears had died while 
swimming in search of sea ice 
(C. Monnett and J. S. Gleason 
Polar Biol. 29, 681-687; 
2006). He was, however, 
reprimanded for leaking 
government documents that 
later helped environmental 
groups to sue the government. 
See go.nature.com/dfio50 

for more. 
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Spy agency 
Qo space 
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keeping mum p.21 


Fraud drives most 
retractions — but journals are 


tumble life cycle of the 


The rough-and- 
Milky Way p.24 


Under the Affordable Medicines Facility - Malaria programme, shopkeepers sell malaria treatments alongside groceries and over-the-counter medicines . 


PUBLIC HEALTH 


Malaria plan under scrutiny 


Lack of data and donor uncertainty leave public-health experiment on the rocks. 


BY AMY MAXMEN 


malaria drugs the best way to stem the 
disease? As a US$463-million pilot pro- 
gramme to test the strategy in seven African 
countries winds down, public-health experts 
are questioning whether the approach makes 
sense given shrinking global health budgets 
and a steady decline in malaria prevalence. 
Although no official decision has been 
announced about whether to continue the 
programme, known as the Affordable Medi- 
cines Facility - Malaria (AMFm), many of 
those familiar with it have told Nature that it 


E showering a country with low-cost 


must change or be phased out after this year. 
“For me, the problem is that it has not been 
proven that the AMFm made a difference to 
malaria,’ says Alan Court, senior adviser to 
the United Nations special envoy for malaria. 
“There has to bea public-health purpose or else 
there is no purpose.” Court is chair of the work- 
ing group expected to recommend a path for the 
programme's future on 14-15 November dur- 
ing a board meeting of 
the Global Fund to Fight 


AIDS, Tuberculosis and Read morein 
Malaria, whichis based _Nature’s Outlook 
in Geneva, Switzerland, — onmalaria: 


and oversees the AMFm. 


When the AMFm was conceived in 2004, 
malaria rates in developing countries were sky- 
rocketing, in part because the malaria-causing 
parasite Plasmodium falciparum had become 
resistant to the drug chloroquine. Officials 
worried that resistance would also develop 
to artemisinin — a newer and more effective 
drug — if people did not combine it with other 
therapies, but such combinations were much 
more expensive than either chloroquine or 
artemisinin alone. 

The AMFm aims to make artemisinin- 
based combination therapies (ACTs) readily 
available and affordable in malaria-ridden 
countries by relying on the free market for 
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> their distribution. The AMFm subsidizes 
ACTs for drug importers in the pilot countries 
(Ghana, Kenya, Madagascar, Nigeria, Niger, 
Uganda and Tanzania), who then distribute 
the drugs to public and private vendors at low 
cost. Vendors can sell the drugs to anyone, 
even people who have not been diagnosed 
with malaria. The point is to make the drugs 
accessible to those who would otherwise not 
get them — such as children and 
people in rural areas. “I grew up 
very far from public-health facili- 
ties” in Cameroon, says Emmanuel 
Nfor, acting director of the AMFm, 
“so to me, the AMFm is not only a 
good idea, it is an excellent idea”. 

Yet, although the AMFm sub- 
sidized 60% of the world’s supply 
of ACTs last year, it is unclear how 
many of the drugs reached the pilot 
programme's target populations. 
Because anyone can buy the pills, 
people who can afford them may 
take them whenever they feel ill, 
whether or not malaria is the culprit. 
And although children under the age 
of five account for 86% of malaria 
deaths worldwide, the AMFm esti- 
mates that in 2011, only 36% of the 
subsidized ACTs bought by the pri- 
vate sector were for children’s formu- 
lations. 

The Global Fund commissioned 
an independent evaluation by ICF 
International, a consulting firm in 
Fairfax, Virginia, and the London 
School of Hygiene and Tropical Medicine 
(LSHTM), UK, to look at the impact of the 
AMEn, but it has not settled the debate over 
the programme’ effectiveness. “We see that the 
AMFm has been a game-changer in the pri- 
vate for-profit sector,’ says Sarah Tougher, a 
health economist at the LSHTM and part of the 
committee that evaluated the study (the pre- 
liminary report is available at go.nature.com/ 
nsc6ck). According to Tougher, large changes 
in the price and availability of ACTs, as well 
as the share of the market supplied by private 
vendors, “were achieved in just a few months”. 

But these benchmarks miss the point, says 
Mohga Kamal- Yanni, a senior health adviser 
at Oxfam GB in Oxford, UK. “You don’t needa 
huge independent evaluation to calculate that 
a huge subsidy will permit shopkeepers to buy 


> 


MORE 
ONLINE 


TOP STORY 


Number of reported 
malaria cases in 2010 
= >7 million 
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© 1 million-3 million 
~ <1 million 

Malaria not endemic 
0 AMFm pilot country 


and sell more of a drug,’ she says. “Sales don't 
mean anything unless you know who the sales 
are for” 

Testing before treatment was not consid- 
ered essential when the AMFm was proposed 
because easy-to-use diagnostic tests were 
not available at the time. They are now, and 
they are also more relevant than they were in 
2004 because malaria prevalence has declined 


HOT ZONES 


The Affordable 
only some o 


since then. Today, in some AMFm-supported 
nations, a child with a fever is as likely to have 
pneumonia or another fever-inducing ailment 
as to have malaria. 

With no way to know how much of the 
subsidized drugs have been wasted, lead- 
ers in global health will struggle to convince 
donors that the programme is cost effective. 
No one has yet stepped forward to fund the 
next phase, and the AMFm working group 
is debating modifications and exit strategies. 
“We are grappling with the cost of all of this,” 
Court says. “For example, we know that the 
vast majority of people who die of malaria are 
children; should we just target them?” 

In one future scenario, a transformed 
AMFm would subsidize paediatric doses, 
which cost less than the adult formulations. 


ility - Malaria (AMFm) project includes 
tries where malaria is most prevalent. 


But Albert Peter Okui, a programme manager 
for the National Malaria Control Programme 
in Kampala, Uganda, points out that adults 
could simply take multiple children’s doses if 
their versions are no longer on store shelves. 

Another option is to reduce subsidies in 
countries with smaller burdens of malaria. 
Court says that this could allow for an expan- 
sion of the programme beyond the pilot 
countries, which were selected on 
the basis not only of malaria burden 
but also because private shops were 
heavily involved in drug distribu- 
tion and because the countries had 
handled prior Global Fund grants 
effectively. An expanded programme 
could potentially reach countries 
such as the Democratic Republic 
of Congo, which in 2010 has more 
reported cases of malaria than four 
of the pilot countries — Ghana, 
Madagascar, Niger and Nigeria — 
combined (see ‘Hot zones’). 

The United States, represented by 
the President’s Malaria Initiative, has 
a seat in the AMFm working group. 
In a document posted online on 
28 September, the initiative states 
its concerns with the programme, 
including the overuse of ACTs by 
people who did not need them. The 
United States did not support the 
AMF*m pilot directly because offi- 
cials questioned whether a top-down 
subsidy to importers would get drugs 
to the most vulnerable groups. 

Ultimately, donors will dictate whether 
the AMFm marches on, says Oliver Sabot, 
executive vice-president at the Clinton Health 
Access Initiative in Boston, Massachusetts, 
and a member of the AMFm working group. 
“People are committed to working their hearts 
out to find technical solutions,’ he says, “but if 
there’s not enough funding we will need to roll 
back the whole thing” 

So much uncertainty just weeks before the 
decision is nerve-racking for officials in the 
pilot countries and for drug importers, hospi- 
tals and small pharmaceutical companies that 
depend on the subsidies. “We should be told 
about this,” says Sigsbert Mkude, a programme 
officer at the National Malaria Control Pro- 
gramme in Dar es Salaam, Tanzania. “We have 
little information about what comes next.” m 
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Progeria is one condition that has benefited from a redeployment of ‘failed’ drugs. 


DRUG DEVELOPMENT 


New cures sought 
from old drugs 


Researchers to re-examine compounds shelved by industry. 


BY MEREDITH WADMAN 


rogeria is a rare, lethal disease that ages 
children so rapidly that they seem to be 


80 years old when they are just 10. So 
patients and families celebrated the news last 
week of the first therapeutic success against 
the disease — as did advocates of the notion 
that abandoned drugs can rise again. A clini- 
cal trial has shown improved symptoms in 
children with the disease who used lonafarnib 
(L. B. Gordon Proc. Natl Acad. Sci. USA http:// 
doi.org/jfz; 2012), a drug developed by Merck 
in the 1990s that failed against its original 
target, head and neck cancer. 

If resurrecting an ineffective drug works for 
progeria, why not for other conditions, asks 
Francis Collins, director of the National Insti- 
tutes of Health (NIH) in Bethesda, Maryland, 
whose lab in 2003 identified the mutated gene 
that causes progeria. “And why not seek those 
answers by ‘crowd-sourcing’?” he asks. 

Roughly 30,000 drugs have been shelved 
by the pharmaceutical industry over the past 
three decades, and medical funders are now 
inviting researchers to find a new future for 
some of them. This month, the UK Medical 
Research Council is expected to announce the 


initial awards in a £10-million (US$16-million) 
programme aimed at repurposing 22 stalled 
compounds developed by London-based 
AstraZeneca. And this week, the NIH’s National 
Center for Advancing Translational Sciences 
(NCATS) will solicit the first full applications 
to a similar programme it launched in May. 
NCATS invited researchers to look for new 
uses for 58 abandoned compounds contrib- 
uted by eight big drug companies. All were 
taken through preclinical testing and into early 
human trials at a cost of millions of dollars, and 
all were found to be safe. They were abandoned 
either for lack of efficacy or for business reasons. 
The planned budget for the NCATS pro- 
gramme is just $20 million in 2013, but the com- 
petition seems to be fierce: by mid-August, the 
agency had received nearly 160 pre-applications 
for a maximum of eight awards. Kathy Hudson, 
acting deputy director of NCATS, says that the 
pre-applications sought to attack conditions 
as diverse as autism, Alzheimer’s disease and 
cancer. For many compounds, multiple uses 
were conceived; for one 
compound alone, the 
agency received seven 
applications targeting six 
diseases or conditions. 


> NATURE.COM 
For details on the 
NCATS drugs, see: 
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This week, the most promising will be asked to 
submit full applications. 

The winners, to be announced in June 2013, 
will have to provide preclinical evidence of the 
biological rationale for the compound’s new 
use. The company then gets to decide whether 
it wants to take that finding any further. 

John LaMattina, former president of global 
research and development at Pfizer in New 
York, thinks that the programme faces long 
odds because drug companies have already 
tested these compounds against many tar- 
gets. “There has been an awful lot of hype,” 
says LaMattina, who is now a senior partner 
at PureTech Ventures, a life-sciences venture- 
capital company in Boston, Massachusetts. Any 
new finding by an NIH grant recipient, he adds 
“is going to have to be pretty compelling for a 
company to take it forward”. 

Steven Potkin, a psychiatrist at the 
University of California, Irvine, worries that 
even compelling findings may not be enough 
to persuade companies to reinvest in a shelved 
compound. Potkin wants to repurpose acom- 
pound originally aimed at depression that 
he thinks could help with mood-regulation 
problems across a range of psychiatric dis- 
eases from bipolar disorder to schizophrenia. 
But the NIH requires the participating compa- 
nies to share the intellectual property created if 
grant recipients find new uses for a compound; 
a company can then license it back to develop 
the drug. “There is no guarantee that this will 
happen,’ Potkin says. 

Hudson says the companies have agreed that, 
if they are not interested, the academic partner 
can either purchase the agent from the com- 
pany, or have it manufactured by a third party, 
so that the push towards the clinic can still pro- 
ceed. Then, “if the company elects not to pursue 
the new indication, all is not lost’, Hudson says. 

Chemists are grumbling about a different 
issue: the companies have not released the 
structures of the 58 compounds (although the 
programme's winners will learn the structures 
of their compounds). 


“This project That has led to a “tre- 
has required mendous waste of 
a delicate effort” for chemists 
dance to get trying to deduce any 
industry to share _ newactivities that the 
compoundsand compounds might 
data.” have, says Jeremy 


Berg, former direc- 
tor of the NIH’s National Institute of General 
Medical Sciences in Bethesda, and now acom- 
putational and systems biologist at the Univer- 
sity of Pittsburgh in Pennsylvania. 

But some say that the NIH had to meet the 
industry halfway. “This project has required 
a delicate dance to get industry to share com- 
pounds and data,’ said Thomas Insel, who was 
acting director of NCATS until 23 September, in 
an August e-mail to Berg. “My hope is that if this 
first step goes well, we can expect more sharing 
in the future” m 
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ASTRONOMY 


The telescopes that 
came in from the cold 


Twin spy telescopes could drive US space astronomy forward, but at what cost? 


BY ERIC HAND 


hen astronomer Alan Dressler 
was invited to see what might be 
the future of NASA’s astrophysics 


programme, he had to leave his mobile phone 
behind, lest he be tempted to grab a quick 
snapshot. As he and a dozen others passed 
through the ITT Exelis facility in Roches- 
ter, New York, a guide held up a flashing red 
light, to warn working Exelis engineers to seal 
their lips in front of people without security 
clearance. 

Their destination was the cavernous clean 
area of Building 1230, where two 2.4-metre 
telescopes, each as big as the Hubble Space 
Telescope and never flown, rested on low ped- 
estals. “It seemed almost too good be true,’ says 
Dressler, an astronomer at the Carnegie Obser- 
vatories in Pasadena, California. “Things like 
this just don't drop on your doorstep.” 

The unexpected gift comes from the US 
National Reconnaissance Office (NRO), a 
secretive surveillance agency that built the tel- 
escopes to peer down on Earth. In June, NASA 
revealed that the NRO had bequeathed the 
scopes to the space agency because they were 
no longer needed. Now NASA has to figure out 
what it will do with them — and whether it can 
afford the cost of kitting them out with instru- 
ments and sending them into orbit. 

This month, NASA plans to announce a 
science-definition team that will embark on 
that assessment. The team will report by April 
2013 to NASA administrator Charles Bolden 
on the pros, cons and costs of adapting one 
of the telescopes for a mission to investigate 
dark energy, the phenomenon thought to be 
accelerating the expansion of the Universe. 
But astronomers are already encouraged. As 
the veil of secrecy surrounding the telescopes 
lifts, astronomers are beginning to size up the 
devices’ capabilities. And so far, they are lik- 
ing what they see — so much so that they are 
now talking about tacking on an instrument 
that would detect extrasolar planets directly. 
“I think the enthusiasm has only increased as 
time has gone on,” says 


Dressler. > NATURE.COM 
The most likely first For more on 

use foran NRO telescope _ space telescopes 

isas an alternative tothe _ see: 

proposed Wide-Field  go.nature.com/vy7sy8 


ANATOMY OF A GIFT 


The National Reconnaissance Office (NRO), a US spy agency, has given 
NASA two telescopes, each with a primary mirror as big as that in the 


Hubble Space Telescope but with a wider field of view. 
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support 
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FIELD OF VIEW 


Camera detectors built for the NRO telescope 
could potentially survey much bigger swathes 
of the sky than Hubble’s best cameras. 
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Infrared Space Telescope (WFIRST), the top- 
ranked mission in the 2010 astronomy decadal 
survey. Before the NRO telescopes came along, 
astronomers were aiming for a wide-field 1.3- 
metre survey telescope to search for the imprint 
of dark energy, to find exoplanets and to study 
star-forming regions of the Galaxy. WFIRST 
was not expected to fly until the mid 2020s. But 
with a telescope already in hand, the NRO ver- 
sion of a WFIRST mission could conceivably 
launch at the end of the decade — potentially 
challenging Europe’s Euclid space telescope, a 
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dark-energy mission scheduled to fly in 2019. 
The NRO telescope suits the WFIRST mis- 
sion because it has a much wider field of view 
than Hubble (see ‘Anatomy of a gift’), which 
makes it perfect for spotting the thousands of 
supernovae and millions of galaxies needed 
to pin down dark energy. And at a workshop 
held last month at Princeton University in 
New Jersey, Gary Matthews, the director 
of astronomy at ITT Exelis, released data 
showing how smoothly the telescope’s mir- 
ror had been polished. “The mirror is about 


NASA 


> 99 


as good as Hubble,” he says. 

The telescope’s supporting structure is 
made ofa resin that resists distortions caused 
by temperature changes, which will help keep 
the main mirror stable. Active control of a 
secondary mirror could adjust it to correct for 
any distortions due to the main mirror, further 
sharpening the telescope’s optics. 

“This telescope was clearly designed to pro- 
duce very stable images,” says Matt Mountain, 
director of the Space Telescope Science Insti- 
tute in Baltimore, Maryland. And extremely 
stable images will be crucial for the success of 
one particular dark-energy technique — weak 
lensing — which looks for subtle distortions in 
the shapes of galaxies due to intervening dark 
matter (see Nature 489, 190-191; 2012). 

Astronomers acknowledge that the tele- 
scope will not be ideal for studying the most 
distant galaxies in the Universe, which are vis- 
ible only in the infrared. That requires a system 
for cooling the mirror but, unlike the proposed 
WFIRST instrument, the NRO telescope is 
designed to work at room temperature. How- 
ever, its larger mirror will have so much more 
light-gathering power that it will be able to spot 
many more faint objects nearer to hand, even 
ifit can't stare as deeply into space. 


COSTS AND BENEFITS 

Some astronomers, however, are questioning 
whether the value of the free hardware — each 
NRO telescope is worth at least US$250 mil- 
lion — can compensate for the extra costs 
entailed in going from a 1.3-metre mission 
to a 2.4-metre mission, which will require a 
larger rocket and a larger camera. Although 
the WFIRST mission was expected to cost 
$1.5 billion, one NASA estimate puts the NRO 
option at $1.75 billion. 

But Princeton astronomer David Spergel, 
who organized the workshop, believes that 
that figure underestimates the savings to be 
made by using the NRO scope. Not having to 
cast and polish a primary mirror avoids a long, 
labour-intensive process requiring an army 
of technicians, he says. Spergel thinks that a 
$1.6-billion mission is realistic. 

He would bump up the cost another 
$200 million, however, to add an instrument 
that could take advantage of the extra light- 
gathering capability of the NRO telescope. 
This would be a coronagraph, which can block 
the light of a star while still revealing the dim 
glow of orbiting planets. WFIRST was going 
to search for exoplanets by detecting gravita- 
tional-lensing events: distortions in the light 
of a background star caused by a large planet 
orbiting another star in the foreground. But 
that method would not easily allow astrono- 
mers to detect planets that orbit close enough to 
be within a star’s habitable zone. Astronomers 
would also prefer to collect a planet's reflected 
light directly. Mountain says that a modern 
coronagraph on NRO-WFIRST might be 
able to detect a Neptune-sized planet. It could 
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also survey the dust that shrouds many stars 
and that could block small planets from view. 
Knowing how ubiquitous and severe the dust 
problem is would help exoplanet astronomers 
work out just how big their next big ask — a 
mission to directly detect Earth-like planets — 
has to be. 


HUMAN INVOLVEMENT 

One way to reduce the cost of the NRO- 
WFIRST mission for NASA's astrophysics 
division would be to launch it on one of the new 
fleet of rockets that NASA will be eager to test 
at the end of the decade as it moves beyond the 
now-grounded space shuttles. But that would 
involve NASA‘s human space programme, an 
option that the science-definition team has 
been asked to consider. It could mean moving 
the mission from its intended orbit around the 
Sun — at a dynamically stable spot known asa 
Lagrangian point some 1.5 million kilometres 
beyond Earth's orbit — to a geostationary orbit 
about 36,000 kilometres above Earth (still 
much further out than Hubble). The geosta- 
tionary option would be within reach of a wider 
variety of rockets — and of potential servicing 
missions by astronauts. 

Although a Lagrangian point is a better spot 
for astrophysical observations because, for 
example, it lies outside Earth’s radiation belts 
and has a more constant temperature envi- 
ronment, a geostationary orbit would allow 
much higher data-transfer rates, which would 
be particularly important in a survey mission 
accumulating vast amounts of data. 

Astrophysicists believe that they are follow- 
ing the most sensible path for repurposing 
the NRO hardware, but they are aware that 
other space scien- 


“It seemed too tists wouldn’t mind 
good to be true. acrack at it too. One 
Things like idea presented at 
this just don’t the Princeton work- 
drop on your shop was to investi- 


doorstep.” gate Earth’s aurora 
and ionosphere by 
observing the edge of Earth from a vantage 
point within the orbit of the Moon. This pro- 
posal might need special approval, however, 
as one of the NRO’s stipulations for donating 
the telescopes was that they would not be used 
to look at Earth. Such a large telescope could 
also be useful in planetary science, to search 
for asteroids likely to pass close to Earth or to 
study the faint objects beyond Neptune’s orbit. 

But there is a second telescope that could 
support these ideas, and there are assorted 
loose parts, including a primary mirror, for 
a third telescope, testimony to the NRO’s lav- 
ish funding. Touring the ITT Exelis facility, 
Dressler was struck by the number of huge 
chambers designed to test space telescopes 
under vacuum. This was a place ready to stamp 
out Hubble-sized telescopes by the dozen, he 
says. “It makes you a little jealous,” he says. “It’s 
kind of neat and kind of sad.” = 
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Across Africa, the need for improved tuberculosis diagnosis and treatment is growing. 


BIOMEDICAL RESEARCH 


Foundation opens 
TB lab in Africa 


Howard Hughes Medical Institute sets up shop at ground 
zero for tuberculosis: South Africa’s KwaZulu-Natal. 


BY LINDA NORDLING IN DURBAN 


the entrance to the King George V Hos- 

pital in Durban, South Africa. It is safer 
out here: fresh air limits the spread of disease. 
Inside, the sickest lie listless, too tired to get up, 
or perhaps beyond caring. 

This threadbare hospital is on the front line 
in Durban’s war against multidrug-resistant 
and extensively drug-resistant tuberculosis 
(TB). Its patients — most of whom also have 
HIV — undergo gruelling courses of treatment 
that last for months or years. Those who get 
better may experience side effects such as hear- 
ing loss or psychosis. The unlucky ones die. 

For William Bishai, these bleak circum- 
stances offer a source of hope, through an 
unusual research collaboration. The micro- 
biologist heads the KwaZulu-Natal Research 
Institute for Tuberculosis and HIV (K-RITH), 
which officially opens on 9 October ina new 
building on the campus of the Nelson R. Man- 
dela School of Medicine at the University of 
KwaZulu-Natal (UKZN) in Durban. 

The province of KwaZulu-Natal has one of 
the highest rates of TB cases in the world: about 


Pp atients huddle in dressing gowns outside 


3,000 drug-resistant cases are diagnosed each 
year. Furthermore, 80% of people diagnosed 
with TB there also have HIV — making the 
province perhaps the best place in the world 
to study the lethal interplay between these dis- 
eases. The dire need, and the opportunity to 
develop new treatments and diagnostics, led 
the Howard Hughes Medical Institute (HHMI) 
in Chevy Chase, Maryland — one of the rich- 
est biomedical research foundations in the 
world — to establish K-RITH as its first labo- 
ratory outside the United States. The HHMI 
will spend about US$75 million on the insti- 
tute over 10 years; the UKZN will contribute 
another $10 million. 

The seven-storey, $40-million K-RITH 
building is close to Durban’s main chest clinic, 
where 15,000 people are screened for TB, HIV 
and other sexually transmitted diseases every 
month. This is where people will be recruited to 
take part in research. “The samples are literally 

just outside our door,’ 


> NATURE.COM says Bishai. 

Read more in Antibiotics that suc- 
Nature's Africa cessfully treat TB have 
special: been around since the 
go.nature.com/ylnyfw 1940s, but interrupted or 
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incomplete courses of treatment using first-line 
drugs such as isoniazid and rifampicin have led 
to an alarming worldwide increase in infec- 
tions with resistant strains of TB (T. Dalton 
et al. Lancet http://doi.org/h8r; 2012). K-RITH 
will focus on developing better treatments and 
faster and more accurate diagnostic tests, not 
only to improve survival, but also to reduce the 
spread of the disease. In addition, the institute 
will study the relationship between TB and 
HIV, which Bishai says is poorly understood. 
“Tt is clear that TB makes HIV worse and HIV 
makes TB worse. But we don't understand the 
mechanism behind this,’ he says. 

The institute is part of a growing trend in 
Africa to locate research on diseases close to 
the people who have them, says Bishai. Sam- 
ples collected in Africa were once sent to the 
United States, Europe or Asia for study, but this 
often leaves local scientists and patients feeling 
exploited. Co-location of research and patients 
helps to build scientific capacity in the region, 
and encourage locals to participate in studies. 

Eight investigators already work at K-RITH, 
and three are African. “We really beat the 
bushes to find African scientists,” says Bishai, 
who moved from Maryland to Durban last 
year. Ugandan engineer Frederick Balagaddé 
will use microfluidic chips — chemistry sets 
the size of credit cards — to develop improved 
tests for HIV and TB. Adrie Steyn, a South 
African microbiologist, will study how Myco- 
bacterium tuberculosis, the microbe that causes 
TB, fends off attacks by the immune system. 
And Thumbi Ndung’u, a Kenyan virologist, 
wants to find out why some people are less 
susceptible to HIV infection than others. 

The other investigators are European or from 
the United States, but all had to agree to spend 
at least 80% of their time in Durban. One is 
a legend of TB research: Frenchman Jacques 
Grosset was diagnosed with TB at the age of 
25, and after getting out of the sanatorium he 
took up the fight against the disease at the Pas- 
teur Institute in Paris. Over the past 50 years 
he has had a hand in developing most of the 
drug regimens used to treat TB. Now, at the age 
of 83, Grosset has moved from Paris to Dur- 
ban, expecting to finish his long and illustrious 
career using K-RITH’s biocontainment labora- 
tories to study drug-resistant TB in mice. 

K-RITH’s first clinical study is already under 
way, and aims to work out the optimum dose 
of TB drugs to use in children who have both 
TB and HIV. Another begins recruiting this 
month, as part of a global trial of a combina- 
tion of drugs — PA-824, moxifloxacin and 
pyrazinamide — sponsored by the Global 
Alliance for TB Drug Development, a public- 
private partnership in New York (see Nature 
487, 413-414; 2012). 

Having this hive of research activity on their 
doorstep is already raising the hopes of local 
clinicians, says Surie Chinappa, a doctor at the 
Durban chest clinic. “I think it will be positive 
for our patients.” m 
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SARS veterans 
tackle coronavirus 


Genome sequence of new virus speeds up testing. 


BY DECLAN BUTLER 


epidemic of SARS (severe acute respira- 

tory syndrome) have sprung into action 
again to investigate the latest threat: a new 
SARS-related virus that has killed one man 
and left another seriously ill. Last week, the 
researchers reported the genome sequence of 
the new coronavirus and the first diagnostic 
tests to screen for it — two major advances that 
will help in efforts to control the pathogen if it 
turns into a wider menace. 

The SARS virus was identified in March 
2003 as the cause of an epidemic that had 
emerged in China several months before, and 
which had spread rapidly around the world. 
It caused nearly 8,500 cases and 916 deaths 
before it was finally contained in July 2003. At 
the time, scientists knew almost nothing about 
the virus — coronaviruses had received scant 
attention until then because they had previ- 
ously caused little more than colds. 

The research and public-health networks 
established during the SARS epidemic — and 
the body of coronavirus research that followed 
— puts scientists today in a much stronger 


See who helped to fight the 2003 


position to understand the latest virus and to 
develop countermeasures such as drugs and 
vaccines, should they be required. “We are all 
collaborating again,” says Christian Drosten, 
director of the Institute of Virology at the Uni- 
versity of Bonn Medical Centre in Germany, 
who has been involved in developing diag- 
nostic tests for the pathogen. “This is the old 
SARS club? 

So far, there is little evidence that the virus 
poses any major public-health threat. No one 
who came into contact with the two cases has 
fallen ill, suggesting that the virus does not 
spread easily between humans. Nonetheless, 
health authorities worldwide are not being 
complacent — respiratory viruses can cause 
pandemics, and this strain has already caused 
serious disease. The key question now, and 
one that the diagnostics will help to answer, 
is whether the two cases are isolated events or 
whether the virus could strike again and per- 
haps adapt to spread more easily in humans. 

The first case was a 60-year-old man admit- 
ted to the Dr Soliman Fakeeh Hospital in 
Jeddah, Saudi Arabia, on 13 June with severe 
pneumonia and acute renal failure, who died 
on 24 June. Post-mortem tests were negative 
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for influenza and the other usual suspect 
viruses, so Ali Mohamed Zaki, a microbiolo- 
gist at the hospital, ran a coronavirus test at 
the suggestion of Ron Fouchier, a virologist at 
Erasmus University Medical Centre in Rotter- 
dam, the Netherlands, who had worked on the 
SARS virus. 

On 20 September, Zaki posted his results on 
the Program for Monitoring Emerging Dis- 
eases (ProMED), an online disease-reporting 
system, confirming that the coronavirus tests 
were positive. A week later, Fouchier’s group, 
which had received an isolate of the virus from 
Zaki in early July, published the pathogen’s 
genome sequence in the GenBank database. 
The genome confirmed that the pathogen 
was a new coronavirus, the closest relatives of 
which are found in bats. 

Two days after seeing Zaki’s ProMED post, 
the UK Health Protection Agency reported 
that it had found a second case: a 49-year-old 
man from Qatar who fell ill on 3 September 
with similar symptoms. He was admitted to 
intensive care in Doha on 7 September and 
then transferred to a London hospital, where 
he remains seriously ill. Comparing a fragment 
of the genome sequence of his virus with that of 
the first case showed that the two were identi- 
cal. The viral genome sequence also enabled an 
international group of researchers, including 
Fouchier and Drosten, to quickly devise diag- 
nostic tests that look for short, characteristic 
stretches of the virus’s RNA. The collaboration 
came easily, says Drosten: “The good thing here 
is that these are all friends from the SARS time.” 

Existing SARS research provides a useful 
template for further investigation of this lat- 
est coronavirus, adds Drosten. Scientists will 
use animal models such as mice, ferrets and 
macaques to study the pathogen’s virulence and 
how it spreads, for example. They will also test 
whether antivirals and vaccines developed since 
SARS to treat other coronaviruses are effective. 

A key experiment, he says, will be to find 
where the new virus latches on to the human 
lung. Some scientists suspect that it might 
bind to the angiotensin-converting enzyme 2 
(ACE2) receptor, as did the SARS virus. That 
could be both good and bad news. The recep- 
tor is found deep in the lungs, where infections 
can cause severe disease, but viruses nestling 
there are less apt to be coughed or sneezed into 
the air than are those found higher in the lungs. 

“Receptor-binding properties could also 
be crucial to the success of potential control 
measures, should they be needed,’ says Dros- 
ten. SARS was contained by isolating suspected 
cases, partly because it did not spread quickly, 
but also because those it infected became very 
ill before the virus moved into the upper res- 
piratory system. That made cases easy to iden- 
tify before the patients started spreading the 
virus. Flu pandemics, by contrast, are impossi- 
ble to stop, largely because those infected with 
the virus spread it to others for days before they 
show any symptoms of infection. m 
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Misconduct is the main cause 
of life-sciences retractions 


Opaque announcements in journals can hide fraud, study finds. 


BY ZOE CORBYN 


onventional wisdom says that most 

retractions of papers in scientific jour- 

nals are triggered by unintentional 
errors. Not so, according to one of the largest- 
ever studies of retractions. A survey’ published 
in Proceedings of the National Academy of Sci- 
ences has found that two-thirds of retracted 
life-sciences papers were stricken from the 
scientific record because of misconduct such as 
fraud or suspected fraud — and that journals 
sometimes soft-pedal the reason. 

The survey examined all 2,047 articles in 
the PubMed database that had been marked 
as retracted by 3 May this year. But rather than 
taking journals’ retraction notices at face value, 
as previous analyses have done, the study used 
secondary sources to pin down the reasons for 
retraction if the notices were incomplete or 
vague. These sources included investigations 
by the US Office of Research Integrity, and evi- 
dence reported by the blog Retraction Watch. 

The analysis revealed that fraud or suspected 
fraud was responsible for 43% of the retrac- 
tions. Other types of misconduct — duplicate 
publication and plagiarism — accounted for 
14% and 10% of retractions, respectively. Only 
21% of the papers were retracted because of 
error (see ‘Bad copy). 

Earlier studies had found that the percentage 
of retractions attributable to error was 1.5-3 
times higher”. “The secondary sources give a 
very different picture,’ says Arturo Casadevall, 
a microbiologist at Yeshiva University in New 
York, and a co-author of the latest study. 
“Retraction notices are often not accurate.” 

Elizabeth Wager, a UK-based medical writer 
and co-author of a previous study’ that relied 
on journal retraction notices, isn’t surprised 
by the finding of hidden misconduct. “We 
found many notices that seemed deliberately 
obscure or vague,” she says, speculating that 
authors and journals may use opaque retrac- 
tion notices to save face or avoid libel charges. 

The latest study shows a ten-fold increase 
(to about 0.01%) in the proportion of papers 
retracted owing to fraud since 1975. Previ- 
ous analyses have seen a growing trend in 
retractions in general? , but the latest report 
sheds new light on the extent to which fraud is 
responsible. It also found a correlation between 
journal impact factor and the number of 
fraud-induced retractions, says Ferric Fang, a 


TOP TEN RETRACTORS 
Journals with the most retractions attributable to 
fraud or suspected fraud, as recorded in PubMed. 


Journal Number | 2011 impact 
of articles factor 

The Journal of 37 12 

Biological Chemistry 

Anesthesia & 33 3.07 

Analgesia 

Science 32 32.45 

The Journal of 30 5.86 

Immunology 

Proceedings of the Zi, 10.47 

National Academy of 

Sciences 

Blood 21 9.79 

Nature ig) 36.24 

The Journal of 17 15.43 

Clinical Investigation 

Cancer Research 16 8.16 

Cell 13 34.77 


BAD COPY 
Most retracted papers listed in PubMed were 
withdrawn owing to fraud or suspected fraud. 


Fraud/suspected fraud Duplication Other 
43.4% 14.2% 11.3% 


TOTAL RETRACTED ARTICLES: 2;047 


Error Plagiarism 


21.3% 9.8% 


microbiologist at the University of Washington 
in Seattle, who led the study. 

Influential journals, including Science, 
Nature, Proceedings of the National Acad- 
emy of Sciences and Cell, all appear in the 
top-ten list of publications with retractions 
because of fraud or suspected fraud (see 
“Top ten retractors’). For some journals, 
including the two topping the table — The 
Journal of Biological Chemistry and Anesthe- 
sia & Analgesia — the tally was boosted by 
multiple retractions from the same few indi- 
viduals, such as anaesthesiologist Joachim 
Boldt, formerly of the Ludwigshafen Clini- 
cal Center in Germany. 


Indeed, Fang and his DNATURE.COM 
colleagues found that Read morein 

38 research groups with Nature's retractions 
five or more retractions _ feature: 
accounted for 44% of — go.nafure.com/cgt4re 


articles linked to fraud or suspected fraud. 

Whether the overall rise in fraud-induced 
retractions is the result of an increase in mis- 
conduct, or simply down to more scrutiny, is 
an open question, says Fang. It is also unclear 
whether the high-impact journals have more 
retractions for fraud because they are checked 
more closely, or because they are more likely 
to attract fraudsters. But Fang thinks that the 
large rewards for publishing in leading jour- 
nals — which can range from winning grants 
to receiving tenure — are powerful incentives 
that could be driving some of the trend. “We 
need to look at how we have structured the 
system, so scientists are not given incentives 
to [commit fraud] quite as strongly,’ he says. 

The survey found some significant geo- 
graphical differences. Retracted papers with 
lead authors based in historical scientific 
superpowers, such as the United States and 
Germany, were more likely to be linked to 
fraud. In emerging scientific powers such as 
India and China, however, plagiarism and 
duplication caused more of the retractions. 
“These trends may reflect differences in incen- 
tives, cultural norms and proficiency in Eng- 
lish among these countries,” says Fang. 

Ivan Oransky, a New York-based journalist 
and co-founder of Retraction Watch, suggests 
setting up a ‘transparency index’ for journals, 
to rank them on criteria such as the clarity of 
their retraction notices. The idea, which he 
says he would be keen to work on, could pro- 
vide a much-needed incentive for journals to 
improve their performance in this area. Data 
from the current study could also serve as a 
basis for a retractions database to help scien- 
tists avoid wasting time trying to replicate or 
build on retracted work, he adds. 

“Tm not necessarily opposed to the idea, 
but I have concerns about how such a database 
could be properly maintained and updated,” 
says Fang. “Our study is merely a snapshot. 
Creating an accurate, centralized database that 
could be used as an ongoing resource would be 
a considerable undertaking.” m 
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Don’t let copyright 
block data mining 


Matthew L. Jockers, Matthew Sag and Jason Schultz explain why 
humanities scholars have pitched in to the Authors Guild v. Google lawsuit. 


dvances in computer technology 
Azmee! with the availability of 
digital archives are allowing human- 
ities scholars to do what biologists, physicists 
and economists have been doing for decades 
— analyse massive amounts of data. A far 
richer understanding of literature promises 
to emerge. For instance, large-scale quanti- 
tative projects are forcing scholars to recon- 
sider how literary canons are formed and are 
showing the extent to which authors’ works 
are shaped by factors outside their own crea- 
tive control, such as the period in which they 
lived, their gender and their nationality. 
Yet in the United States, legal action pur- 
sued by the Authors Guild, an advocacy 
group for writers, could bar scholars from 


studying as much as two-thirds of the literary 
record. A small group of humanities scholars 
(ourselves included) is fighting back. 


CASE HISTORY 

In 2004, Google began scanning and digi- 
tizing books held in prominent US aca- 
demic libraries such as those at Stanford 
University in California and the University 
of Michigan in Ann Arbor, to make these 
collections fully searchable. Currently, 
more than 20 million books, most of which 
are out of print, can be searched at Google 
Books (books.google.com). Unless a book’s 
copyright protection has expired, or the 
copyright owner has agreed to make the 
content freely available, the search engine 
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displays just three-line ‘snippets’ from each 
book — enough to tell the searcher that the 
listed item is indeed what they are looking 
for. With the right tools, however, data from 
the full text can, in principle, be mined and 
used in large-scale analyses. 

In 2005, the Authors Guild, based in New 
York, with some 8,500 members including 
published authors, literary agents and law- 
yers, filed a class-action lawsuit claiming that 
Google's scanning activity was a “massive 
copyright infringement” Google, the Authors 
Guild and a group of publishers agreed to a 
class-action settlement in 2008. This gave 
Google permission to continue scanning 
and to sell electronic books individually or 
as part of a subscription service. In return, 
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> Google agreed to share the advertising 
revenue from Google Books with authors 
and publishers, and to make one-off pay- 
ments to copyright owners amounting to a 
minimum of US$125 million. 

The settlement was strongly opposed by 
foreign governments, the US Department 
of Justice, the US Copyright Office, authors, 
academics and rival technology companies 
for various reasons. Many feared that it would 
create an unfair monopoly, with Google 
having the sole right to publish millions of 
‘orphan’ works — books whose copyright 
owners cannot easily be located. In 2009, the 
settlement was revised to try to address these 
concerns. But the court rejected the revised 
settlement in 2011, and the legal controversy 
continues. 

In September last year, in a separate case, 
the Authors Guild sued several universities 
for participating in Google’s book-scan- 
ning project. As part of this case, known as 
Authors Guild v. HathiTrust, it is also pursing 
legal action against the HathiTrust Digital 
Library, a service that enables a large consor- 
tium of universities and research libraries to 
store, secure and search their digital collec- 
tions using a shared infrastructure. 

Among the issues at the heart of this dis- 
pute is what researchers in the emerging 
field of digital humanities will be allowed to 
analyse: only public-domain books (mostly 
those published before 1923 in the United 
States), or all known literary works. The 
answer may define the future of the field. 


TO THE BARRICADES 

On 3 August, the Association for Computers 
and the Humanities and a group of 64 schol- 
ars (that includes us), from disciplines 
ranging from law and computer science to 
linguistics, history and literature, filed an 


KNOWING YOUR SUBJECT 


In a network of more 


amicus curiae brief on behalf of the digital 
humanities. We are urging the court in 
Authors Guild v. Google to grant a summary 
judgment in favour of Google, a step that 
will effectively end the litigation’. We filed 
a similar brief in the HathiTrust case on 
7 July. The judge in the HathiTrust case is 
currently considering our submission, anda 
decision is expected imminently. The court 
in Authors Guild v. Google will consider our 
argument as soon as the appeals court deals 
with certain procedural issues. 

We feel that if the Authors Guild wins the 
cases against Google and the HathiTrust, the 
ruling could set a dangerous precedent — 
that copyright gives authors and publishers 
the right to control all, even ‘non-expres- 
sive; uses of their works that involve copy- 
ing. Copyright law has long recognized the 
distinction between protecting an author’s 
original expression and the public’s right to 
access the facts and ideas contained within 
that expression. According to the US Con- 
stitution, the purpose of copyright is “To 
promote the Progress of Science and useful 
Arts”. Preventing authors from monopoliz- 
ing facts and ideas allows others to explore 
their own creativity and ‘stand on the shoul- 
ders of giants’ 

We believe that copyright law is not (and 
should not be) an obstacle to statistical and 
computational analysis of the millions of 
books owned by university libraries. We 
are not talking about republishing them or 
even quoting from them. We simply want to 
extract information from and about them to 
sift out trends and patterns. 

As an example, clustering more than 3,000 
nineteenth-century novels according to how 
much they share certain stylistic properties 
(specific words and punctuation marks) and 
thematic features (such as groups of commonly 


an 3,000 nineteenth-century novels, arranged according to how much they share 


certain stylistic and thematic properties, books authored by men (blue) tend.to cluster separately from 
those authored by women (white). George Eliot’s works (yellow) are an exception. 
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co-occurring words) has thrown up findings 
that would be hard to glean from reading a 
handful of books individually. One is that 
books authored by men tend to cluster quite 
distinctly from books authored by women 
(see ‘Knowing your subject’). This illustrates 
the degree to which gender determines the 
choices made by writers, but also flags up out- 
liers. For instance, within this clustering, the 
works of George Eliot (real name Mary Anne 
Evans) sit firmly among those of male writers. 
In other words, such ‘macroanalytic’ method- 
ology gives researchers a way to see individual 
authors and publications within the context of 
a much larger system. 

Authors’ rights deserve protection. And 
governments and the various stakeholders 
involved may eventually work out how to 
achieve the full potential of digital libraries 
in a way that is fair to writers, readers and 
providers. But digitizing books for ‘non- 
expressive’ uses, such as basic searching and 
text mining, is a separate issue and should 
not be barred on the basis of concerns over 
copyright. An independent review last year 
of intellectual property and growth com- 
missioned by the British government came 
to a similar conclusion’. Unauthorized 
music-file sharing can infringe copyright 
because humans ultimately experience 
those files as musical works. Scanning 
words from library books to make a search 
index, or to compile a list of word frequen- 
cies, does not interfere with the rights of the 
author. These uses simply convert masses of 
text into metadata. 

It is time for the US courts to recognize 
explicitly that, in the digital age, copying 
books for non-expressive purposes is not 
infringement. Courts have already applied 
this logic in analogous cases: Google, Micro- 
soft and others copy web pages to feed into 
their Internet search engines; the online ser- 
vice Turnitin copies exam papers and other 
sources so that plagiarism can be detected. 
These practices have been challenged and 
found to be legal under copyright law. 

It is crucial for future research that the 
right precedent be set. We hope that the 
judges decide that digitization for text 
mining and other forms of computational 
analysis is, unequivocally, fair use. = 
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Max Perutz, James Watson, John Kendrew and Francis Crick talk to a BBC presenter (centre) about 
their Nobel prizes in 1962. 


What makes 
a great lab? 


William Bynum reflects on the factors that have 
brought nine Nobel prizes to the UK Laboratory 
of Molecular Biology in Cambridge. 


hat makes an outstanding 
laboratory? There have been a 
number of these special places 


during the past couple of centuries, but 
none more so than the Laboratory of 
Molecular Biology (LMB) in Cambridge, 
UK, which is celebrating 50 years since it 
got its own building and, this week, 50 years 
since four of its scientists were awarded 
Nobel prizes — Max Perutz, John Kendrew, 
James Watson and Francis Crick. 

Overall, the LMB can claim nine Nobel 
prizes for 13 staff scientists during its illus- 
trious history, plus another eight for those 
who trained or worked there temporarily. As 
a unit within the Cavendish Laboratory at 
the University of Cambridge, the LMB had 
a distinguished pre-history. Few would have 
dared to predict that its independent exist- 
ence would be so successful and productive. 
Seen historically, however, it shares charac- 
teristics with other outstanding laboratories. 
These include new methods of producing 
scientific knowledge, novel approaches to 


training researchers, the innovation and 
excitement that surround an emerging sci- 
entific field and, perhaps most centrally, the 
presence of a gifted individual with the per- 
sonality and vision to make things happen. 
The LMB can thus be seen as a modern 
exemplar of a species that goes back to the 
early nineteenth century. Previous centres of 
excellence had differing ambiences, which 
reflected the scientific cultures of their times. 
Butall share a few important star qualities. In 
search of these, here I briefly examine three of 
the LMB’s ancestors and its parent laboratory. 


LABORATORY LIFE 
Before the nineteenth century, most labora- 
tories were places in which single individuals 
worked, sometimes aided by an assistant or 
two. Chemist Antoine Lavoisier (1743-94), 
with his wife at his side, was typical. Wives 
and servants often helped. Justus von Liebig 
(1803-73) changed this pattern definitively. 
Liebig’s chemistry laboratory at the Uni- 
versity of Giessen in Germany opened in 
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1826 and acquired international renown. 
It attracted students from all over Europe 
and earned Liebig a reputation as a ‘chemist 
breeder’ His lab was an early example of the 
research and teaching establishments that 
made the German universities the envy of 
many. It began as a single room, with a fire 
in the middle surrounded by work benches. 
Liebig’s work on the compositions of chemical 
substances and their reactions was outstand- 
ing, and his focus on agricultural, industrial 
and biological issues gave his research a highly 
topical flavour. 

He trained his protégés carefully, espe- 
cially in qualitative analysis. Students flocked 
to him, with the result that European chem- 
istry during the middle of the nineteenth 
century bore a distinctly Liebigian flavour 
as his students moved to influential positions 
elsewhere. The identification of state-of-the- 
art problems and the training of students to 
solve them characterize his achievements. 
Liebig’s initiative was widely adapted in the 
natural and biomedical sciences throughout 
the German university system. 

Training was also part of the brief of 
physiologist Ivan Pavlov (1849-1936), but 
he brought his own organizational genius 
to bear on the ‘physiology factory’ that he 
masterminded in St Petersburg, Russia, 
famously studying dogs. He adapted aspects 
of manufacturing to the production of sci- 
entific knowledge. Pavlov’s staff were among 
the first to specialize in different tasks: surgi- 
cal, chemical, dog handling. The dogs were 
also specialized. Many had permanent gastric 
fistulas; some had oesophageal or pancre- 
atic ones or other surgical interventions that 
allowed Pavlov to examine physiology in situ. 

Pavlov’s ‘laboratory’ eventually occupied a 
whole building — already nearer to the mod- 
ern usage of the word, anda far cry from the 
single rooms of most researchers of earlier 
times. As science has become more complex 
and cooperative, so the physical structures 
of laboratories have evolved faster than the 
language we use to describe them. 

The next example reinforces this point: 
Thomas Hunt Morgan (1866-1945) and his 
Fly Room at Columbia University in New 
York. It was more than just a room. Modern 
experimental genetics was born there, with 
the fruitfly (Drosophila melanogaster) as the 
prize experimental subject. The fly’s rapid 
breeding time and four large chromosomes 
made it ideal for examining how chromo- 
somal events during meiosis and mitosis 
relate to the structural features of the adult. 
A chance finding ofa fly with a white eye — 
instead of the usual red — led Morgan to the 
importance of the sex chromosome. 

Morgan was a gifted scientist who sur- 
rounded himself with equally gifted students 
and postdoctoral researchers, including 
Alfred Sturtevant, Calvin Bridges and 
Hermann J. Muller. Although Morgan was a 
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patrician from the US South, he ran his lab on 
egalitarian lines, with the consequence that 
historians still debate the relative contribu- 
tions of the different parties. Morgan won 
a Nobel prize by himself in 1933, although 
he divided the money with Sturtevant and 
Bridges, to help to educate their children. 
Only Muller (who won his own Nobel in 
1946 for his work on the effects of radiation 
on mutation rates) suggested seriously that 
Morgan sometimes exploited his students. 
Most believed that in the free interchange and 
mutual devotion to uncovering the genetics of 
the fruitfly was a formula that worked. 

Liebig, Pavlov and Morgan each cre- 
ated something special. Their labs achieved 
international prominence, attracted talented 
scientists and bred further success. Each lab 
bore the stamp of the founder’s ambitions and 
personality, and this relationship between 
the boss and the establishment stands out. 
Morgan’ culture of egalitarianism provided 
a model for many successful modern labora- 
tories, not least the LMB. 


LAB SPECIATION 

A kind of speciation can sometimes occur 
with laboratories. What is now the LMB 
began life ensconced in the Cavendish Labo- 
ratory, the centre of physics at the University 
of Cambridge and, by any reckoning, also 
among the most successful modern labs. 

The Cavendish opened in 1874. Its first 
director, James Clerk Maxwell (1831-79), 
was arguably the most important physicist 
between Newton and Einstein. A genial man 
blessed with a fertile mind and remarkable 
ingenuity, Maxwell contributed to many 
problems in physics, and he completed the 
work on electromagnetism begun by Hans 
Christian Orsted, Michael Faraday and oth- 
ers. He showed that the Sun's light comes to 
us through electromagnetic waves, and at the 
same time predicted the range of radiations 
that has been central to modern science and 
modern life. 

Many of these advancements came from 
the Cavendish, beginning with J. J. Thomson 
(1856-1940) and his discovery of the electron 
in 1897. His was one of many Nobel prizes 
from the Cavendish, and like all leaders of 
successful laboratories, he was a good talent 
spotter. And successful labs attract ambitious 
and talented individuals. Ernest Rutherford 
(1871-1937) came from New Zealand to 
the Cavendish because of Thomson and his 
group. He thrived there, and after stints in 
Montreal, Canada, and Manchester, UK, he 
succeeded Thomson as director in 1919. He 
brought with him from Manchester James 
Chadwick, rooting nuclear and atomic phys- 
ics firmly in the Cavendish during the early 
decades of the twentieth century. 

Until it got its own building in 1962, the 
LMB was simply a research unit within the 
Cavendish. Of its early workers, the most 
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important was Max Perutz (1914-2002), 
whose style and personality shaped the LMB. 
Perutz came to England in 1936, hoping to 
work with Frederick Gowland Hopkins, the 
pioneer Cambridge biochemist. A meeting 
with the X-ray crystallographer J. D. Bernal, 
then still at the Cavendish, convinced the 
young Perutz that X-rays could provide the 
tools to solve the molecular structures of 
proteins. It took Perutz a further year to get 
horse haemoglobin crystals that were suit- 
able for analysis using X-ray diffraction tech- 
niques. Because of the Second World War, it 
was seven more years before he could return 
to this molecule, his life’s work. Even though 
he had lived in England for several years 
before the outbreak of war, he was treated 
as an enemy alien and incarcerated for nine 
months during 1940, in Britain and Canada. 
He spent the rest of the war back in Britain 
designing aircraft carriers. 

Encouraged by Lawrence Bragg, then 
director of the Cavendish, Perutz returned 
to studying haemoglobin, joined by John 
Kendrew (1917-97). The beginnings of the 
LMB date from 1947, when the UK Medical 
Research Council (MRC) began supporting 
the work of Perutz and Kendrew. The origi- 
nal name of their group was the MRC Unit 
for Research on the Molecular Structure of 
Biological Systems. Perutz described himself 
then as a chemist working in a physics labora- 
tory ona biological problem: a fairly accurate 
summary of the inputs into the field dubbed 
‘molecular biology in 1938 by the Rockefeller 
Foundation administrator Warren Weaver. 

Perutz and Kendrew pursued a promis- 
ing avenue of molecular biological research, 
but haemoglobin is such a complex model 
that they soon added the simpler myoglo- 
bin to their agenda. Hugh Huxley joined 

the group in 1948, 


“A single but turned to study- 
administrator, ing the biophysical 
Audrey Martin dynamics of muscle 
(and her dog contraction: an early 
Slippers), example of the widen- 
looked after ing range of biological 
things. = problems in the unit’s 


remit. The increasing 
international reputation of the group’s work 
brought talented young scientists to Cam- 
bridge, including physicist Francis Crickas a 
research student and biologist James Watson 
as a postdoc. 

A successful laboratory generally breeds 
more success, with implications for size and 
ethos. Bragg was appreciative of the group’s 
prominence, but the Cavendish was chroni- 
cally short of space in post-war austerity Brit- 
ain. So in 1957 Perutz, aware that Frederick 
Sanger from the biochemistry department 
also needed more space, wrote to the MRC 
about housing the molecular biologists in a 
new laboratory. Unsurprisingly, negotiations 
were slow given so many vested interests, but 
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money was found. The present building, in 
Hills Road, Cambridge, expanded on more 
than one occasion and, still growing, was 
opened by Queen Elizabeth II in May 1962. 
There were then about 25 staff members 
and the same number of visiting workers. In 
October, there was a party to celebrate the 
Nobel prizes of Perutz and Kendrew in chem- 
istry, and Watson and Crick (with Maurice 
Wilkins) in physiology or medicine. 


BRIGHT BEGINNINGS 

The party in 1962 was just the beginning. 
Over the past half-century the LMB has been 
at the centre of molecular biology, the dis- 
cipline at the heart of the life sciences. New 
groups in developmental biology, immunol- 
ogy, cell biology and neurobiology attest to 
the expansion of the field. The growth has 
often been opportunistic, clustered around 
individuals such as Sydney Brenner, César 
Milstein, Aaron Klug and Michel Goedert. 
The simpler rules of only a generation ago 
(the original procedures for producing 
monoclonal antibodies were not patented, 
for instance) have given way to the contem- 
porary competitive world of biotechnology. 

As the laboratory has grown, its admin- 
istrative structure has inevitably become 
more complex. Until Perutz retired in 1979, 
it had no director. Perutz didn’t want to be 
one, and it meant he could retain his lab 
space after retirement. Instead, the lab had 
a loose management committee, which met 
occasionally and saw its main job as attract- 
ing outstanding talent to the lab. Perutz kept 
the bureaucratic structures of the laboratory 
minimalist, and until 1973 a single admin- 
istrator, Audrey Martin (and her dog Slip- 
pers), looked after things. The egalitarianism 
that Morgan had fostered at Columbia was 
effectively duplicated in Cambridge, an 
ethos encouraged by the lab’s successive 
directors — Sydney Brenner, Aaron Klug, 
Richard Henderson and Hugh Pelham — as 
each has presided over an ever-larger opera- 
tion. The LMB now has some 400 workers, 
about half permanent staff and the rest stu- 
dents and visiting scientists. 

It is easier to describe success than to 
explain it, but several of the characteristics 
that typified earlier exemplars are also part 
of the core ideology of the LMB. New tech- 
niques, new disciplines, new ways of tackling 
old problems and the ethos of collegiality 
have continued to characterize the lab. So has 
the key personality of Perutz, whose influ- 
ence shines even after his death. The achieve- 
ments of the lab’ first half-century have been 
rewarded by even more new buildings, due to 
open in 2013. Long may the LMB flourish. m 
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Researchers in Céte d’lvoire bag the skull of a colobus monkey to check for pathogens. 


Fatal exchange 


Nathan Wolfe applauds a tome on interspecies disease 
transmission that mixes research with human stories. 


he exchange of microbes between 
"Tian and animals — zoonoses 

— began to fascinate me nearly 
20 years ago, when I was studying wild 
primate populations. There are a remark- 
able number of zoonotic agents, ranging 
from anthrax to HIV, West Nile virus and 
influenza. It seemed shocking even in the 
early 1990s that this vastly important class 
of microbe was considered, if at all, to bea 
funny-sounding boutique niche in biology. 

Fortunately, times have changed. Zoon- 
oses have come of age in terms of scientific 
and public recognition. We now know that 
these agents are the most likely source of 
future pandemics. The past few years have 
seen an explosion in technical literature, gov- 
ernment-sponsored research programmes, 
philanthropic interest and even films such 
as Contagion. During the past year, several 
general treatments of zoonoses have been 
published, including Jacques Pepin’s The 
Origin of AIDS (Cambridge University Press, 
2011), Craig Timberg and Daniel Halperin’s 
Tinderbox (Penguin, 2012) and my own The 
Viral Storm (Allen Lane, 2011). 

Nowcomes Spillover by David Quammen, 
one of that rare breed of science journal- 
ists who blend exploration with a talent 
for synthesis and storytelling. Quammen’s 
excellent The Song of the Dodo (Prentice 
Hall & IBD, 1996), on island biogeography, 


is difficult to top, but 
Spillover comes close. 
This is a timely, seri- 
ous and impressive 
work that marks the 
maturation of a field 
of microbiology. 
Quammen takes us 
into the field, offer- 
ing an idea of research 
challenges in remote 
hotspots of disease 
emergence such as 
parts of Gabon and 
Malaysian Borneo. His 
narrative on the ‘cut 
hunter’ theory, based on our understanding 
of how HIV began, presents a realistic sense of 
the first person infected with the chimpanzee 
simian immunodeficiency virus (SIV) that 
would become HIV. It is the kind of portrayal 
that generalists and specialists have waited for. 
In researching Spillover, Quammen visited 
scientists such as malaria researchers Janet 
Cox-Singh and Balbir Singh, and travelled 
to remote research sites in China, Bangla- 
desh and beyond. He interviewed the usual 
suspects at leading international universities 
(myself included, in the context of pandemic 
prevention). But Quammen also sought out 
oft-neglected scientists and fieldworkers 
in places such as the Democratic Republic 


Spillover: Animal 
Infections and 
the Next Human 
Pandemic 

DAVID QUAMMEN 

W. W. Norton: 2012 
592 pp. $28.95, £20 
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of Congo. These doughty professionals do 
considerable scientific heavy lifting, nego- 
tiating political and sometimes security 
hurdles to collect the samples that form the 
backbone of zoonosis research. Refreshingly, 
Quammen also highlights a number of up- 
and-coming young researchers in the field. 

The book gives a dutiful review of the range 
of research on disease ecology and zoonosis, 
the subfields and their central concepts. But 
the behind-the-scenes stories are particu- 
larly enjoyable. Quammen brings to light the 
details left out of papers and technical talks; 
even workers in the field will find new stories. 
For example, his accidental encounter and 
subsequent interviews with survivors of the 
Mayibout Ebola epidemic in Gabon provide 
first-person perspectives on the epidemic and 
the hunting events that led up to it that are 
absent from any other account I’ve read. 

Specialists will find little new in terms of 
scientific concepts. Nevertheless, some dis- 
cussions, such as that on mathematical mod- 
elling in chapter three, push scientists towards 
the next level in their research. There are 
some minor technical errors, such as the age 
of the recombination event of monkey SIVs 
that produced SIV,,,, the variant that infects 
chimpanzees; Quammen suggests that it may 
be only hundreds, rather than thousands, of 
years old. And some will bridle at a few over- 
simplifications — for example, the suggestion 
that the H5N1 flu virus is inherently more 
worrying than H1N1, which among other 
things belies the potential for reassortment 
between the two. But these are quibbles. 

Importantly, Spillover challenges those 
working in the area of disease ecology and 
emerging infectious diseases to ask: what 
is next? Quammen concludes by reviewing 
some programmes that have been devel- 
oped by bodies such as the US Department 
of Defense to understand and address the 
threats that zoonoses pose for the increasingly 
susceptible human population. These include 
the US Agency for International Develop- 
ment’s Emerging Pandemic Threats Pro- 
gram, which aims to identify risks early and 
develop global capacity to stop them before 
they spread. Quammen rightly believes that 
such ventures are the way forward. 

These efforts, and what they will evolve 
into, may be pivotal to human survival. As 
Quammen points out using the example of 
gypsy moths (Lymantria dispar) and nucle- 
opolyhedroviruses, other species on have 
succumbed to pathogens following explosive 
population growth. A big question looms 
over all of us. How will humans fare? m 
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NASA's Spirit rover landed in the Gusev Crater on Mars in January 2004. 


MARS EXPLORATION 


Roving the red planet 


It is people who drive Curiosity and other robot 
missions on Mars, reminds Jim Bell. 


here are Martians among us. Hun- 
ik of them, living on Earth but 
working ‘om Mars, telecommuting to 
and fro as part of the interplanetary science 
and exploration programme conducted by 
NASA and other space agencies. These Mar- 
tians are the men and women who help to 
control robots such as the Mars rovers on the 
red planet’s surface, or any of the three sat- 
ellites orbiting Mars and acquiring remote- 
sensing data. But both the media and the 
public tend to anthropomorphize the rovers 
as the explorers in this frontier world. 
Cognitive scientist William Clancey, based 
at NASA’s Ames Research Center in Moffett 
Field, California, works on how humans 
and societies adapt their lives and work to 
computers. During the early years of the 
Spirit and Opportunity missions, Clancey 
embedded himself in the Mars Exploration 
Rover (MER) science and engineering team 
at NASA%s Jet Propulsion Laboratory in Pasa- 
dena, California. His Working on Mars docu- 
ments firsthand many of the highs and lows 
of the people trying to carry out remotely 
controlled field science over a vast distance 
using mobile, programmable laboratories. 
By observing these Martian “natives” and 
interviewing a subset of the science and 
engineering teams, Clancey captures some 
of the complex inner workings of a modern 
scientific expedition that just happens to be 
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on another planet. And he draws analogies 
to eighteenth- and nineteenth-century expe- 
ditions such as those of James Cook and 
Alexander von Humboldt. 

The rovers’ missions have become major 
media and sociological phenomena. The 
starry roster now includes the 1997 Mars 
Pathfinder Sojourner rover, the MER Spirit 
and Opportunity, which landed in 2004 
(Opportunity is still active), and NASA’s 
newest interplanetary vehicle, the car-sized 


The team in charge of Curiosity celebrates the 
first pictures sent back from the red planet. 
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Curiosity rover that landed in early August. 
Millions of people worldwide, including 
educators and their students, have followed 
the missions online — an opportunity itself 
made possible mostly through the commit- 
ment of the rover science teams and NASA 
to share the adventures as widely as possible. 

So the anthropomorphizing — which hap- 
pens even among scientists working with 
the rovers — is perhaps not surprising. The 
robots are seen as having human-like senses: 
vision (cameras), touch (arms and drills) and 
taste (chemical analysis), as well as mobility 
(wheels). They are viewed as “plucky” or 
“intrepid, and during their missions “strug- 
gle” and “make discoveries”. Among my 
colleagues on the rover science teams, it is 
common to refer to them as “robot geolo- 
gists” or, in the case of Curiosity, a “robotic 
astrobiologist”. Ultimately, even Clancey has 
to admit that casting the robots as valiant 
explorers in a dangerous land has heightened 
public interest in them — and perhaps even 
boosted public and Congressional support 
for NASA and its planetary-science missions. 

Clancey underlines the importance of 
the unique and unprecedented sociotech- 
nological fusion of remote sensors and the 
organizational structure, tactical-operations 
processes and strategic-planning tools pio- 
neered by the MER team (many by trial and 
error and learning on the fly). Together, these 
promote the agency of the scientists and 
engineers and boost their capacity to actively 
and efficiently engage in scientific fieldwork 
in remote environments and situations. 

One prime example is the way in which 
the MER planning 
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topography of the 
rover’s environment. 
Using previous days’ 
imaging, it produces 
a fully functional 
computer-graphics 
representation of the 
vehicle in a virtual- 
reality environment; 
Working on the result is almost 
Mars: Voyages like an interplanetary 
of Scientific : : 
video game. The inter- 
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£20.95 highly intuitive. It 


helps remote science 
team members to imagine being there in 
the field, which, in turn, helps to enable (for 
example, through guiding the rover’s drives 
or the positioning of the instruments on spe- 
cific parts of rocks or soils) many of the kinds 
of measurements that team members would 
actually make themselves if they were there. 

This fusion provides a general model 
for successful future robotic, and human, 
remote field science. For example, the Curi- 
osity mission's science and engineering team 
is using many of the operations and plan- 
ning tools and structures of the MER mis- 
sion, although necessarily modified for the 
specific attributes and constraints of that 
vehicle, landing site and mission objectives. 
Remote under-sea expeditions, as well as 
many other applications of telerobotics and 
telepresence, could similarly benefit from 
aspects of the MER model. 

The human-centred computing aspects 
of missions such those of the Mars rovers 
have profound implications for the future 
of space exploration in general. For exam- 
ple, during the past few decades it has been 
popular to incite “humans versus robots” 
debates between advocates of crewed space- 
flight and of robotic exploration. As Clancey 
makes clear, the debate is moot. 

Humans are already exploring the Solar 
System using the tools of robotics and the 
methods of field science. Robotic compan- 
ions — “intrepid” or otherwise — are abso- 
lutely going to be part of humanity’s eventual 
in-person exploration of our planetary 
neighbours. It is not going to be humans ver- 
sus robots, but humans and robots, working 
together. = 
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Books in brief 


Secrets of the Ice: Antarctica’s Clues to Climate, the Universe, and 
the Limits of Life 

Veronika Meduna YALE UNIVERSITY PRESS 232 pp. £29.95 (2012) 
Bleak and storied wonderland it may be, but Antarctica is also the 
world’s biggest lab, where hundreds of scientists cluster to take the 
planet’s pulse. Science writer Veronika Meduna follows the action 

in this homage to frontier science. Beginning with geological and 
glaciological findings on Antarctica’s climate history, she segues 
into marine life, the microbes of the cold deserts and the fugitive 
‘hum’ of the Big Bang chased by physicists and astronomers on the 
continent. A beautifully illustrated journey through ‘Mars on Earth’. 


The Book of Barely Imagined Beings: A 21st Century Bestiary 
Caspar Henderson GRANTA 336 pp. £25 (2012) 

Award-winning writer Caspar Henderson read Jorge Luis Borges’s 
The Book of Imaginary Beings (1967) and realized that nature’s 
creations often trump the fantastical for sheer surreality. Henderson’s 
mainly marine beasts are a dazzling catch. The “genital fingered”, 
gherkin-sized stomatopod Gonodactylus smithii, for instance, uses 
specialized limbs for defence — delivering enough force to break 

a bone. Eels, whales, arachnids and more are examined, with 
Henderson’s central concern the survival of all this glory in the midst 
of the biodiversity drain. Wittily illustrated by Golbanou Moghaddas. 


Guesstimation 2.0: Solving Today’s Problems on the 

Back of a Napkin 

Lawrence Weinstein PRINCETON UNIVERSITY PRESS 377 pp. £13.95 
(2012) 

This follow-up to the popular Guesstimation offers more on the joy 

of mathematical estimation, and inspiration for the budding analyst. 
Physicist Lawrence Weinstein trawls questions from the pragmatic to 
the bizarre. Among them are his probings of energy, transportation 
and recycling such as gauging the US plastic-bag pile-up on the basis 
of hydrocarbon use. He also covers the senses, heavenly bodies, 
radiation — and the amount of urine in public swimming pools. 


Dangerous Work: Diary of an Arctic Adventure 

EROUS Wy Arthur Conan Doyle (Eds Jon Lellenberg and Daniel Stashower) 

ae BRITISH LIBRARY PUBLISHING 368 pp. £25 (2012) 

Who knew that Arctic explorers lauded the creator of fiction’s most 

famous sleuth for his own detective work on routes to the North Pole? 

Arthur Conan Doyle — author of Sherlock Holmes — published the 

data in the article ‘The Glamour of the Arctic’ after a youthful stint as 

ship’s surgeon on a Greenland whaler. His diary of the 1880 voyage 

is here reproduced in facsimile, with published pieces inspired by the 

/ trip. Hair-raising incidents abound, from a sudden on-board death by 
peritonitis to the young medic’s periodic falls into ice-strewn waters. 


The Real Story of Risk: Adventures in a Hazardous World 

Glenn Croston PROMETHEUS Books 256 pp. £16.99 (2012) 

Imagine this: you find yourself worrying about shark attacks while 
crossing a busy road in a daze. Our perception of risk and the reality 
are frequently at odds, biologist Glenn Croston argues in this jazzily 
written exploration of the balance between risk and reward. Croston 
marshals a raft of research on why our view of the phenomenon 

is so skewed, delving into evolutionary roots, our denial of ‘slow’ 
catastrophes, the role of lust in colouring our judgement, our need 
to belong and much more. 
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| COMMENT | BOOKS & ARTS 


Supercomputers such as that at CERN in Switzerland are becoming faster at a predictable rate. 


NETWORK THEORY 


The regularities of facts 


Carl Bergstrom assesses the power of scientometrics in 
predicting the shifts and shelf-life of knowledge. 


ow much of what you know today 
H= be true tomorrow? In a week's 

time? In a decade? The weather 
forecast may change overnight; our esti- 
mate of the number of genes in the human 
genome may change in the coming months; 
our understanding of consciousness may be 
radically different a century from now. 

Knowledge shifts over time, explains Sam 
Arbesman in The Half-Life of Facts, and it 
does so in predictable ways. The book takes 
us on a whirlwind tour of emerging fields 
of scientometrics, and undertakes a broader 
exploration of metaknowledge. Arbesman 
details how researchers beginning to focus 
the big-data lens back on science itself are 
uncovering quantitative laws and regulari- 
ties in the way that scientific knowledge is 
constructed and modified over time. 

Like the decay of atoms, individual 
discoveries may be difficult to predict, but 
in the aggregate, facts change in highly regu- 
lar ways. To illustrate this point, Arbesman 
ranges widely through the scope of human 
knowledge, drawing on examples from phys- 
ics and chemistry, technology and medi- 
cine, sociology and cultural studies, and 
even the arts and humanities. For example, 
over time there is a predictable regularity to 
how measurement errors get smaller, how 
computation and travel become faster, how 
innovations diffuse through social networks 
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and how technologi- 

cal advances drive + a | 
increases in human AME LOE 
populations. Even the | F At i 

magnetic permeability ) TS y 
of iron has increased a) 
in aconsistent manner 
with changes in smelt- 
ing technology. 

Arbesman defines 
facts loosely, not as 
objective truths but as 
little pieces of knowl- 
edge, right or wrong. 
This casts a broad net 
over facts of many dif- 
ferent kinds, at the risk 
of obscuring interest- 
ing and important distinctions. 

Scientific ‘facts’ about the natural world 
— the nature ofan electron or the evolution- 
ary significance of a peacock’s tail — change 
as science progresses and our explanatory 
frameworks shift. Statistics as facts — bat- 
ting averages, gross national products, 
crime rates — change not because the body 
of knowledge around them changes, but 
because the world is changing beneath our 
feet and new events are transpiring. 

So statistical facts need not be as inter- 
connected as scientific facts; for example, 
Guinness World Records is a catalogue of 


if 


The Half-Life 

of Facts: Why 
Everything We 
Know Has an 
Expiration Date 
SAMUEL ARBESMAN 
Current: 2012. 

256 pp. $25.95, 
£18.99 
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independent assertions. In 2011, Zac the 
macaw set the record for the most basket- 
balls slam-dunked by a parrot in one minute; 
meanwhile, the world record for the fast- 
est 100-metre hurdles wearing diving fins 
remained unchanged at 14.8 seconds. 

But in proper network-theorist fashion, 
Arbesman focuses more on the similarities 
than on the differences between these dis- 
tinct types of fact and processes of change. 
Both kinds are generated at predictable rates, 
change in predictable ways and are subject to 
scientometric analysis. 

One quibble I have with the book is that 
occasionally I feel Arbesman’s enthusiasm 
gets the better of him, and he accepts the 
conclusions of sound-bite science with- 
out adequate scrutiny. For example, if 
approximately 80% of citations are copied 
from earlier citations of the same material 
(M. V. Simkin and V. P. Roychowdhury 
Complex Sys. 14, 269-274; 2003), can we 
join Arbesman in the presumption that 
researchers read only 20% of what they cite? 
Or might this pattern arise because authors 
find it easier to compile their bibliographies 
from other reference lists — irrespective of 
whether they have read these papers? 

Overall, however, Arbesman is a delight- 
ful guide to the territory, patently in love 
with this emerging field. He is also a skilled 
storyteller, and his wide-eyed reporting 
invigorates material that could have been 
dry and academic. 

The chapter on hidden knowledge 
deserves particular note. It addresses one of 
the most pressing scientific problems we face: 
how to make vital new connections among 
ideas. In an era in which exhaustive reading 
is no longer possible and library-shelf brows- 
ing is infrequent, how can we design mecha- 
nisms that connect scholars with the ideas 
that they need to move forwards? 

Arbesman hints at possible elements of 
a solution: innovation prizes, social tag- 
ging, systematic meta-analyses and auto- 
mated discovery programs. Big changes are 
coming in the very near future, driven by 
the confluence of the digital revolution in 
publishing, the explosion of computational 
capacity and the accumulated strain of a 
350-year-old system of scientific commu- 
nication pushed to the breaking point in an 
exponentially larger world. 

The current generation will solve these 
problems, and change how science as a 
social and communicative process is prac- 
tised. Reading The Half-Life of Facts, 1 
became excited about the prospect of living 
through — and perhaps even contributing 
to — this change. = 


Carl Bergstrom is a network theorist and 
professor of evolutionary biology at the 
University of Washington in Seattle. 
e-mail: cbergst@u.washington.edu 


P. GINTER/SCIENCE FACTION/CORBIS 


Correspondence 


Biodiversity needs a 
scientific approach 


We agree with Esther Turnhout 
and colleagues (Nature 488, 
454-455; 2012) that the 
Intergovernmental Platform 
on Biodiversity and Ecosystem 
Services (IPBES) should take 
citizen knowledge and non- 
monetary values into account 
to improve the science-policy 
interface for biodiversity 
protection. Even so, knowledge 
used to inform policy must be 
produced through an objective 
process if it is to withstand 
scrutiny. This demands a 
science-based approach. 

Science sets a standard for 
data quality, not for who collects 
the data. It provides a common 
currency for understanding the 
consequences for biodiversity of 
actions arising from the values of 
different stakeholders, including 
local communities, hunter- 
gatherers, commercial exploiters 
and conservationists. 

The role of IPBES in policy 
formulation means that it will 
inevitably meet resistance that 
will seek to undermine data 
credibility, the assessment process 
and the platform itself. Instead 
of avoidance strategies, we need 
mechanisms for successful 
negotiation of such controversies 
to support transformation. 
David A. Westcott, Frederieke 
J. Kroon, Andy W. Sheppard 
CSIRO Ecosystem Sciences, 
Australia. 
david. westcott@csiro.au 


Change of heart on 
nanoparticle risks 


You contend that most 
nanotechnology researchers now 
acknowledge that some areas 

of their work raise legitimate 
environmental, health and 
safety concerns (Nature 488, 
576-579; 2012). This was not 
the case a decade ago, when we 
at the Action Group on Erosion, 
Technology and Concentration 
(ETC) called for a moratorium 
on the commercialization of 


products containing engineered 
nanoparticles. 

In 2002, scientists could point 
us to only one peer-reviewed 
study of nanotube toxicity, and 
companies were still sending a 
Material Safety Data Sheet for 
graphite with carbon nanotube 
shipments. ETC’s concerns 
were dismissed as alarmist. We 
welcome the change in attitude. 

ETC’s central concern has 
always been the economic impact 
on populations in developing 
countries resulting from the 
market disruptions that are 
expected with the advent of new 
nanoproducts and processes. We 
have consistently dismissed the 
hypothetical concept of ‘grey goo’ 
— uncontrolled self-replicating 
nanorobots — as a red herring. 

Finally, ETC has no 
connection to ITS, the group that 
claimed responsibility for the 
nanotech-related bombings in 
Mexico. ETC opposes violence in 
all forms. 

Silvia Ribeiro ETC Group, 
Mexico City, Mexico. 
etc@etcgroup.org 


Fleischmann denied 
due credit 


Philip Ball’s obituary of Martin 
Fleischmann (Nature 489, 34; 
2012), like many others, ignores 
the experimental evidence 
contradicting the view that cold 
fusion is ‘pathological science’ 
(see www.lenr.org). I gave an 
alternative perspective in my 
obituary of Fleischmann in 

The Guardian (see go.nature. 
com/rzukfz), describing what I 
believe to be the true nature of 
what Ball calls a “Shakespearean 
tragedy”. 

The situation at the time of the 
announcement of cold fusion was 
confused because of errors in the 
nuclear measurements (neither 
Fleischmann nor his co-worker 
Stanley Pons had expertise in 
this area) and because of the 
difficulty researchers had with 
replication. Such problems are 
not unusual in materials science. 
Some were able, I contend, to 


get the experiment to work (for 
example, M. C. H. McKubre et al. 
J. Electroanal. Chem. 368, 55-56; 
1994; E. Storms and C. L. Talcott 
Fusion Technol. 17, 680; 1990) 
and, in my view, to confirm both 
excess heat and nuclear products. 

Scepticism also arose because 
the amount of nuclear radiation 
observed was very low compared 
with that expected from the 
claimed levels of excess heat. 

But it could be argued that the 
experiment never excluded the 
possibility that the liberated 
energy might be taken up 
directly by the metal lattice 
within which the hydrogen 
molecules were absorbed. 

In my opinion, none of 
this would have mattered had 
journal editors not responded 
to this scepticism, or to 
emotive condemnation of the 
experimenters, by setting an 
unusually high bar for publication 
of papers on cold fusion. This 
meant that most scientists were 
denied a view of the accumulating 
positive evidence. 

The result? Fleischmann was 
effectively denied the credit due 
to him, and doomed to become 
the tragic figure in Ball’s account. 
Brian D. Josephson University of 
Cambridge, UK. 
bdj10@cam.ac.uk 


European biodiesel 
can be sustainable 


An accurate evaluation of the 
sustainability of European oilseed 
rape for biodiesel production 
would be a useful resource in 
discussions of the European 
Union's bioenergy policies. Your 
ill-judged pronouncement in an 
online News report that rapeseed. 
biodiesel fails the sustainability 
test (Nature http://doi.org/ 
jdn; 2012) risks confusing the 
facts by quoting questionable 
figures from a preliminary study 
(G. Pehnelt and C. Vietze Jena 
Econ. Res. Pap. 039; 2012). 

These figures concern 
some of the most important 
parameters used in sustainability 
calculations. The study 


considerably underestimated 
mean annual seed yields of 
rapeseed used for biodiesel, 
by using outdated yield values 
from the entire European Union 
(around 2.8 tonnes per hectare 
for 1991-2005), rather than 
current yields from the principal 
biodiesel-producing countries 
such as Germany (3.8 tonnes 
per hectare for 2005-10; see 
http://faostat3.fao.org). The 
input values were also based on 
energy-intensive production 
procedures (deodorization, 
for example) that are only 
used in processing rapeseed 
oil for food, and on unrealistic 
transportation emission 
values. Incorrect input data can 
seriously influence the outcome 
of a sustainability evaluation. 
Political decisions need 
to be based on reasoned and 
constructive discussion about 
issues as controversial as 
renewable biofuels, which in 
turn must be based on strong, 
peer-reviewed science. 
Rod Snowdon, Wolfgang Friedt 
Justus Liebig University, Giessen, 
Germany. 
rod.snowdon@agrar.uni-giessen.de 


Reviews turn facts 
into understanding 


Your Editorial on h-index 
forecasting (Nature 489, 177; 
2012) perpetuates the myth that: 
“Review articles, which may not 
add much to the research, count 
the same as original research 
papers, which contribute a great 
deal.” Probably the reverse is 
true. A research paper usually 
provides just one or two new 
facts, whereas reviews synthesize 
our understanding more broadly 
and make it more concrete. 
Some reviews summarize 
thousands of papers (see, for 
instance, D. B. Kell BMC Med. 
Genom. 2, 2; 2009) and turn 
an inchoate and stochastic 
scientific literature into 
knowledge. 
Douglas B. Kell University of 
Manchester, UK. 
dbk@manchester.ac.uk 
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FORUM: Climate science 


The aerosol effect 


Anthropogenic aerosols in the atmosphere undoubtedly influence climate. But do the approaches taken in climate models to 
account for the effects of aerosols provide meaningful estimates of those effects? Two climate scientists offer their opinions. 


THE TOPIC IN BRIEF 

@ Aerosol particles in the atmosphere 
influence clouds, and thereby climate, 
because they act as nuclei for cloud 
formation. 

@ Computational models of climate 
systems have sought to incorporate the 
effects of aerosols on clouds through 
parameterizations. 


Grains of 
salt 


BJORN STEVENS 


here is something captivating about the 

idea that fine particulate matter, suspended 
almost invisibly in the atmosphere, holds the 
key to some of the greatest mysteries of climate 
science. Recent studies have reported that 
interactions of such aerosols with clouds may 
be hiding a large part of the sensitivity of global 
temperature to increasing levels of greenhouse 
gases’. It has also been claimed that aerosol- 
cloud interactions are reshaping patterns of 
rainfall’ and even influencing the development 
of tornadoes’. But such interactions invariably 
turn out to be more nuanced than the simple 
ideas underpinning these and related stud- 
ies would lead one to suspect. This explains 
why, despite decades of research, no consen- 
sus has emerged as to exactly what, other than 
perhaps a slight mitigation of twentieth- 
century global warming, is attributable to 
aerosol—cloud interactions’. 

To put matters into context, consider the 
analogous question of the effect of carbon diox- 
ide on climate. Carbon dioxide always has the 
same composition, its lifetime is very much 
longer than that of atmospheric circulation 
systems, and it affects fluxes of radiant energy 
in ways that are well understood and are not 
particularly contingent on other factors. By con- 
trast, aerosol particles can differ widely in com- 
position, are ephemeral and affect clouds (and 
hence radiative fluxes) in ways that are poorly 
understood and depend ona long list of factors. 


@ However, the representation of 
aerosol-cloud interactions in climate models 
is based on simplifications that ignore 

the complexity of the small-scale physical 
processes governing such interactions in the 
real world. 

@ The value of studying the effects of 
aerosol-cloud interactions in climate models 
has therefore been questioned. 


To determine the correct sign — let alone 
the magnitude — of the effect of some impor- 
tant aerosol—cloud interactions, one may, to 
borrow words from elsewhere, need a weath- 
erman to know which way the wind blows’. 
Models can make good weathermen, and thus 
provide information about the wind, as well as 
about many of the other factors on which aero- 
sol-cloud interactions depend. But for climate 
models, this is true only on scales that are so 
large (hundreds of kilometres) as to be almost 
irrelevant. On fine scales (tens of metres), at 
which there is some understanding of how 
aerosol particles influence cloud microstruc- 
ture, climate modellers are groping in the dark. 

Aerosol-cloud interactions are not only con- 
tingent on, but also compound, the ‘cloud ques- 
tion’. It is one thing to ask, as the cloud question 
does, how the distribution of clouds responds to 
robust changes in the large-scale environment 
— for instance, changes that follow temperature. 
It is quite another to ask how indeterminate 
changes in clouds will be modified by uncertain 
interactions with aerosol particles whose prop- 
erties are also not well understood. So, although 
some answers to the cloud question seem to be 
within reach, a quantitative understanding of 
the global effect of aerosols on clouds is defi- 
nitely not. Fortunately, given that global aerosol 
burdens have remained more or less constant 
for a decade or longer®, and that greenhouse- 
gas concentrations are increasing relentlessly, 
answering the cloud question might help to 
answer the more pressing questions related to 
Earth's changing climate. 

Our poor understanding of the global effect 
of aerosols on clouds means that the incorpo- 
ration of additional details into climate mod- 
els does not make the models fundamentally 
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better or more reliable, just more complex. 
Additional complexity can be great fun, but it 
should not disguise the fact that, at least for 
aerosol-cloud interactions, much is specula- 
tive, and the results of such complex models 
should be taken with a grain of salt. 


Bjorn Stevens is at the Max Planck Institute 
for Meteorology, Hamburg 20146, Germany. 
e-mail: bjorn.stevens@mpimet.mpg.de 


An essential 
pursuit 


OLIVIER BOUCHER 


Deda and clouds influence each other 
through multiple interactions at tempo- 
ral and spatial scales that span many orders 
of magnitude. Increasingly detailed observa- 
tions and modelling studies have revealed the 
complexity of aerosol-cloud interactions. Such 
complexity was certainly not recognized when 
the ‘climate forcing’ associated with these inter- 
actions — that is, their effects on the amount of 
energy entering and escaping from the atmos- 
phere as radiation — was first estimated in an 
atmospheric general circulation model’. After 
almost two decades of research, parameter- 
izations of these interactions in large-scale 
models remain simplistic in some respects, but 
they are neither worthless nor useless. 
Estimates of anthropogenic climate forcing 
are essential for constraining climate sensi- 
tivity to such forcing in observations*. They 
are also important for understanding recent 
climate change’ and to improve regional dec- 
adal climate prediction. Although small-scale 
observations and models are invaluable for 
investigating how man-made aerosols might 
affect cloud properties, they cannot easily be 
scaled up because of the large variability of 
aerosols, clouds (Fig. 1) and environmental 
conditions, and because they overlook the 
feedbacks on aerosols and clouds that act 
through large-scale atmospheric dynamics. 
Satellite observations have been widely 
used to infer correlations between aerosols 


B. STEVENS. 


Figure 1 | Cloud diversity. Climate models must represent the net effect of clouds on solar and thermal 
irradiances over scales that often encompass a rich variety of cloud types, such as the diverse range in the 
cloud field pictured. Aerosol particles affect the microphysical structure of clouds, as well as irradiances 
in the cloud environment. The net effect of these interactions on the radiative properties of clouds is 
poorly understood, and incorporating them realistically into climate models is a challenge. 


and cloud properties or precipitation. They 
can be combined with observations of Earth’s 
radiative budget — the amount of energy that 
enters and leaves Earth’s atmosphere as radia- 
tion — to estimate aerosol forcing on a global 
scale. The methodological challenges of such 
an approach, however, are increasingly being 
recognized, because both aerosols and cloud 
properties depend hugely on meteorological 
conditions. Climate models, although imper- 
fect, are therefore indispensable for calculat- 
ing climate forcing by anthropogenic aerosols. 
Such models are the only means of represent- 
ing the feedbacks induced by aerosol-cloud 
interactions at the global scale. They can also 
be used to tease apart the interwoven effects 
of weather and aerosols on clouds, which 
currently impede observational studies”. 

Estimates of climate forcing from aerosol- 
cloud interactions have not varied much over 
time, even though research has moved from 
empirical to more mechanistic approaches for 
parameterizing these interactions. This does 
not mean, however, that such estimates are 
robust. They often lack a proper treatment of 
uncertainties, and there are indications from 
temperature observations that many climate- 
model simulations of such interactions imply 
too much cooling. 

It is therefore vital to continue with efforts 
to parameterize aerosol-cloud interactions in 
climate models as a means of overcoming the 
present limitations. Indeed, some promising 
avenues of research are opening up. It is now 
possible to perform high-resolution simula- 
tions that incorporate complex microphysical 
representations of aerosol-cloud interactions 
over sufficiently large areas, and for long 
enough periods, to investigate the couplings 


between a cloud field and its environment, 
thus going some way towards bridging the 
gap between the cloud scale and the global 
scale. New methods for representing the sta- 
tistics of cloud properties at scales below those 
resolved by climate models have also been 
developed'’”. Finally, cloud microphysics 


MUCOSAL IMMUNOLOGY 


NEWS & VIEWS 


is now recognized as being important for 
numerical weather prediction; systematic 
verification of weather-model outputs against 
observations, and powerful data-assimilation 
techniques, may provide new insight into 
aerosol-cloud interactions. With so many 
encouraging developments, let’s embrace 
the challenge! m 


Olivier Boucher is at the Laboratoire de 
Météorologie Dynamique, IPSL/CNRS, 
Université Pierre et Marie Curie, 75252 Paris 
Cedex 05, France. 

e-mail: olivier. boucher@lmd jussieu.fr 
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Infection induces 


friendly fire 


Our immune system usually ignores ‘friendly’ gut bacteria. But when infection 
with a pathogen damages the intestine’s mucosal lining, the resident microbes 
can invade the body, inducing immune responses directed at themselves. 


DAVID MASOPUST & VAIVA VEZYS 


he mammalian gastrointestinal tract 
harbours an extensive microbial com- 
munity that is usually well tolerated 
by the immune system. Sometimes, however, 
inappropriate immune responses directed 
against these ‘commensal organisms arise, such 
as in inflammatory bowel disease. But how such 
responses are triggered is unclear. Writing in 
Science, Hand et al.' demonstrate that patho- 
genic infections that disrupt the integrity of 
the gut’s mucosal barriers can precipitate the 
development of long-lived immunity directed 
against the host’s resident microbes. 
CD4' T cells are a class of immune cell that 


deploys potent effector functions to combat 
infections. During development, a reper- 
toire of T cells is generated such that there 
will be low numbers of cells specific for any 
given antigen — a substance that stimulates 
responses from T and B cells of the immune 
system. ‘Naive’ T cells are those that have not 
yet encountered the antigen for which their 
own antigen receptor is specific. Infection with 
a pathogen stimulates proliferation of patho- 
gen-specific T cells, which not only increases 
the number of these cells, but also causes their 
differentiation to effector cells. As the pro- 
liferative response is ending, a population of 
long-lived pathogen-specific ‘memory cells is 
established, which provides a heightened state 
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Figure 1 | Breaching the barrier. a, The lumen of the mammalian intestine contains abundant 
non-pathogenic bacteria, which are called commensal bacteria. Although there are CD4* T cells in 
distant lymphoid organs (such as the spleen or lymph nodes) that specifically recognize these ‘friendly’ 
bacteria, these T cells typically ignore the bacteria and remain in a naive state. b, Hand et al.' show that 
infecting mice with a protozoan pathogen that causes intestinal damage allows the commensal bacteria 
to migrate to other parts of the body. This results in activation and proliferation of both pathogen-specific 
(not shown) and commensal-specific CD4" T cells in the lymphoid organs, and migration of these cells 

to the intestine. c, The authors also show that a population of commensal-specific memory T cells persists 
in the intestine long after the resolution of infection and restoration of the mucosal barrier. 


of readiness to respond to repeat infections. 
The signals received from different microbes 
cause the responding CD4" T cells to adopt 
effector functions best suited to control- 
ling the infection’. However, if a naive T cell 
encounters its specific antigen in the absence 
of infection, it is rendered ‘tolerant’, prevent- 
ing a misdirected immune response. Thus, the 
context of initial antigen exposure is pivotal in 
determining T-cell behaviour. 

Hand et al. examined CD4* T cells specific 
for the flagellin protein expressed by certain 
Clostridium bacteria that are commensals in 
the gut lumen of adult mice (Fig. 1). They 
found that these non-pathogenic organ- 
isms are typically ignored by T cells, which 
remain in their naive state. However, when the 
researchers infected the mice with the parasitic 
protozoan Toxoplasma gondii, which causes 
intestinal damage, the clostridia breached the 
mucosal barrier and could be recovered from 
lymphoid organs such as lymph nodes and 
spleen. This translocation of the commensal 
bacteria caused naive Clostridium-flagellin- 
specific CD4° T cells residing in lymphoid 
organs to proliferate and migrate to the gut. 

Interestingly, as the Clostridium-specific 
T cells proliferated, they differentiated into 
effector T cells ina manner that mimicked the 
CD4' T-cell response to T: gondii. They upreg- 
ulated expression of T-bet, a gene-regulatory 


protein, and they acquired the ability to pro- 
duce the cell-signalling molecule interferon-y. 
These behaviours are characteristic of the 
Th1 differentiation state, which is associated 
with immune responses to intracellular infec- 
tions but has also been implicated in numer- 
ous autoimmune diseases. Hand and colleagues 
also show that the mice retained a population 
of flagellin-specific memory CD4'T cells long 
after resolution of the T. gondii infection and 
re-formation of the mucosal barrier. 

These results have several interesting 
ramifications. For instance, inflammatory 
bowel disease (IBD), which includes Crohn's 
disease and ulcerative colitis, is characterized 
by aberrant CD4° T-cell responses directed 
against intestinal commensal bacteria’. 
Although it is not clear how and why these 
responses are initiated, it is known that varia- 
tions in several genes, including ones that reg- 
ulate the immune response to gut bacteria and 
maintenance of the mucosal barrier, are tightly 
linked to IBD incidence*. However, identical 
twins with equal genetic susceptibility to IBD 
do not always both experience disease, which 
suggests a critical role for environmental 
triggers in this condition’. 

Hand et al. provide evidence that exposure 
to pathogenic microbes could provide such 
a trigger. First, their results demonstrate that 
CD4* T-cell ignorance of commensal bacteria 
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can be broken during an infection, probably 
because of the increased amount of micro- 
bial antigen that is available to stimulate the 
immune system. Second, they show that path- 
ogenic infections shape the differentiation 
program of contemporaneous commensal- 
specific CD4* T-cell responses. In other words, 
when the commensal-specific CD4" T cells are 
first activated, it is in the context of infection, 
and so they adopt an effector program that is 
dictated by the offending pathogen rather than 
by the resident gut microbes. 

This differentiation program is distinct 
from that seen in experimental situations in 
which CD4"* T cells are exposed to commen- 
sal microbes in the absence of pathogenic 
infection®”. It will be interesting to examine 
the relationship between various infections 
and the development of IBD-like disease in 
genetically predisposed mouse models. This 
is likely to be a fruitful line of enquiry, because 
a recent study has shown that infection of 
genetically susceptible mice with murine noro- 
virus tipped the balance towards Crohn’s-like 
disease symptoms”. 

Hand and colleagues’ study also emphasizes 
the fact that memory CD4" T cells specific for 
commensal microbes may exist. The mem- 
ory T-cell population studied by the authors, 
which was specific for a single peptide from 
Clostridium, was fairly small. However, when 
one considers that more than 3 million non- 
redundant genes have been identified" in the 
human intestinal microbiome (all the micro- 
organisms in the gut), this raises the question 
of whether a substantial fraction of all memory 
CD4° T cells may actually be directed against 
commensal microbes. If this is the case, it 
would be particularly interesting to examine 
the effect of these cells on immune responses 
to subsequent intestinal infections (either 
with the same pathogen or one not previously 
encountered) that also cause disruptions of 
the mucosal barrier. It is conceivable that the 
commensal-specific memory CD4' T cells have 
a protective capacity, by amplifying immuno- 
logical sensing of barrier disruption. However, 
a commensal-reactive immune T-cell popula- 
tion could also contribute to inflammatory or 
autoimmune diseases of tissues exposed to such 
microbes, which include the skin and upper 
respiratory tract. 

Thus, in addition to illuminating a potential 
mechanism underlying inflammatory con- 
ditions such as IBD, Hand and colleagues’ 
findings indicate a potential limitation of 
the ‘specific-pathogen-free’ mouse model in 
which most basic immunological experiments 
are conducted. Such mice are not typically 
exposed to gut infections that compromise 
barrier integrity. This point provocatively sug- 
gests that, when studying immune responses 
to pathogens or commensal organisms, immu- 
nologists should also consider examining mice 
that have experienced past intestinal infections, 
because such infections could have a lasting 


impact on the immune system and might 
better recapitulate the human condition. = 
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Cruise control 


for a qubit 


Continuous feedback control — monitoring a system and adjusting its dynamics 
— is widely used to keep systems ‘on track’. This approach has now been used to 
maintain the cycling of a quantum bit almost indefinitely. SEE LETTER P.77 


HOWARD M. WISEMAN 


on a smallish circular racetrack, making 

exactly one circuit every minute. There is 
a big clock on a tower by the start/finish line. 
What strategy do you adopt? An obvious one 
is to keep comparing your position around the 
track with the position of the sec- 
ond hand around the clock. If your 
circular position lags that of the sec- 
ond hand, then step on the accelera- 
tor; if it leads, then ease off (Fig. 1). 
This illustrates three typical features 
of a process known as continuous 
feedback control: monitoring, com- 
parison with a goal and dynami- 
cal adjustment’. On page 77 of this 
issue, Vijay et al.” apply this type of 
control to the most basic quantum 
system: a two-state system called a 
quantum bit (qubit). By monitor- 
ing the excitation of their qubit with 
quite high efficiency, inferring its lag 
or lead relative to a radio-frequency 
field (their clock), and modulating 
the power applied to their qubit, they 
maintain a regular cycling between 
the qubit’s excited and ground states 
almost indefinitely. 

In an ideal world, when the power 
supplied by a car’s engine equals the 
power dissipated (for example, by 
air resistance and friction), the car’s 
speed will remain steady. But in the 
real world, unpredictable fluctua- 
tions in the car and its environment 
(such as variations in wind speed, 
air pressure, tyre pressure, oil tem- 
perature, road surface and other 
variables) will cause the speed to vary 
even with constant power. With no 
feedback control, the position of the 


Jes you have to test-drive a car 


car around the track would, within an hour, 
become completely unrelated to the position 
of the second hand on the reference clock. 
In most modern cars, cruise control can do a 
good job of keeping the speed within a narrow 
range around a set value. But for the task in 
hand, a more sophisticated cruise-control 
mechanism would be needed, which compares 


Figure 1 | Locking cycles to an external clock. Vijay and colleagues” used 
continuous feedback control to keep a quantum bit (qubit) cycling between 
its ground and excited state. The process is analogous to a car driver trying 
to make one lap every minute on a circular racetrack by monitoring the 
position of the car on the track and comparing it with the position of the 
second hand ofa clock on a tower by the start/finish line. In this illustration, 
the car is slightly ahead of where it should be, so the driver should ease off 
the accelerator. In this manner, the driver can continuously compensate 

for any fluctuations affecting the car’s speed. In a system that dissipates 
power, such as the qubit, fluctuations are inevitable, and are also introduced 
by the measurement of the qubit's state. In this analogy, the qubit’s excited 
state is equivalent to the car being at a point (blue flag) that is diametrically 
opposite to the start/finish line, which corresponds to the ground state. 
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the position of the car with the position of the 
second hand, as described above. 

In their experiment, Vijay et al.” use a 
transmon’, a tiny superconducting device that 
can be restricted (at ultra-low temperatures) to 
just two quantum states: its ground state and 
its first excited state. These two states define 
the qubit. The qubit dynamics in this experi- 
ment is in many ways analogous to that of the 
car on the circular track. If the start/finish 
line is like the ground state of the qubit, the 
point diametrically opposite is the excited 
state. The other points around the track are 
analogous to superpositions of the excited and 
ground states — they are still definite (‘pure’) 
states of the qubit, but, unlike the ground 
and excited states, they do not have a definite 
energy value. 

By applying a microwave driving field, the 
qubit can be made to cycle around the ‘super- 
position track at a particular frequency, clock- 
wise from ground state to excited state and 
back again. But like a car, the qubit 
also suffers from power dissipation. 
In the quantum world, this necessar- 
ily gives rise to fluctuations, even at 
zero temperature’. Asa consequence, 
after several microseconds the state 
of the qubit on its cycle will be 
completely unpredictable. 

It is not obvious a priori that 
using continuous measurement and 
feedback control to lock the qubit 
cycling to that of an external clock 
will work, because in the quantum 
regime monitoring induces fluctua- 
tions too!. However, as the original 
theoretical proposal for this quan- 
tum feedback protocol showed’, the 
process should work well for strong 
and efficient monitoring. 

To monitor their transmon qubit, 
Vijay et al. coupled the qubit to a 
microwave probe field in a supercon- 
ducting resonator that has a higher 
frequency than the driving field. The 
probe field continuously leaks out of 
the resonator, and is amplified into 
a macroscopic current. The authors 
used a simple feedback-control algo- 
rithm to modulate the driving field 
in proportion to the deviation of 
the observed current from an ideal 
sinusoidal current generated by the 
radio-frequency field acting as the 
external clock. 

So how well does it work? Although 
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the monitoring was strong (compared with 
the dissipation), its efficiency was less than 
50%. As a result, the feedback control could 
not do better than to keep the qubit state more 
or less in the correct half (as defined by the 
external clock) of the superposition track at 
any given time. Moreover, for about 10% of 
the time the transmon was found in a higher 
excited state than its first excited state, outside 
the qubit’s two states altogether. But compared 
with the no-feedback result of complete unpre- 
dictability within several microseconds, the 
observed stabilization of the qubit’s cycling is 
a big step forward in the feedback control of an 
individual qubit. 

This work paves the way to many more 


experiments on qubit feedback measurement 
and control, which would be enabled by more 
efficient monitoring than that achieved here. 
These experiments include rapid purification 
of a qubit’, tests of quantum-jump theory’, 
feedback control based on measurement 
back-action’ and feedback control based on 
the quantum Zeno effect®. In the longer term, 
continuous feedback control of multi-qubit 
systems is one path to correcting quantum 
errors in a quantum computer. With Vijay 
and colleagues’ experiment, solid-state phys- 
ics joins quantum optics at the forefront of 
quantum feedback-control investigations. m 


Howard M. Wiseman is at the Centre for 


Resident risks 


An innovative method for probing the genomes of the vast community of 
microorganisms that inhabit the human gut provides an alternative approach 
to identifying risk factors for type 2 diabetes. SEE LETTER P.55 


JULIA OH & JULIA A. SEGRE 


he interplay between genetics and the 

environment in controlling our health 

remains an open area of investigation. 
Technological advances in genome sequenc- 
ing and analysis have empowered the search 
for the genetic determinants of human traits 
and diseases, but genetic variants that directly 
cause disease have so far been identified for 
only a small proportion of common dis- 
eases. For many other conditions, variants 
that increase disease risk are known, but the 
incompleteness of these associations suggests 
that environmental factors significantly con- 
tribute to the likelihood of development of 
these diseases. 

A candidate source of environmental 
influence on disease development that is 
gaining increasing attention is the human 
microbiome — the trillions of microorganisms 
that inhabit our bodies. Our primary genome 
is in continual interaction with the combined 
genomes of these residents, which makes this 
‘metagenome’ a fascinating part-environmen- 
tal, part-genetic factor in human biology. On 
page 55 of this issue, Qin et al.' present an 
innovative method for searching the microbial 
metagenome for disease risk factors, using as 
their example type 2 diabetes, a disease that 
is reaching epidemic proportions in many 
regions across the world. 

Heritability estimates for the common forms 
of type 2 diabetes range between 30 and 60%, 
suggesting that it is a complex disorder with 
multiple interacting genetic and environmen- 
tal components. Genome-wide association 


studies (GWAS), which are used to investigate 
individuals’ DNA sequences for genetic vari- 
ants associated with disease, have identified 
variants that contribute to the development 
of type 2 diabetes, but these only partially 
explain the variation in individual risk’. Other 
significant risk factors for the disease, such as 
obesity and diet, are also associated with an 
altered gut microbiome’, suggesting that this 
disease is a good candidate in which to search 
for risk factors associated with the human 
microbial metagenome. 

Qin et al. surveyed the gut metagenomes 
(derived from stool samples) of 345 Han Chi- 
nese individuals with and without type 2 diabe- 
tes using a method called shotgun sequencing, 
which produces random fragments of DNA 
sequence from the complex mix of microbial 
genomes. These fragments are then assembled 
into larger, gene-length fragments on the basis 
of sequence overlap. The authors compared 
these larger fragments with reference genome 
sequences and gene-function databases to 
determine the proportions of different bacte- 
rial species present in the samples, and conse- 
quently the relative representation of various 
microbial functional and metabolic attributes. 
They then used these ‘markers’ in a similar 
way to how variations at specific nucleotides 
are used in GWAS to identify blocks of DNA 
sequences, and carried out a ‘metagenome- 
wide association study’ (MGWAS) to iden- 
tify different features of the metagenomes in 
people with and without the disease. 

The MGWAS identified more than 
42,000 gene markers associated with type 2 
diabetes — a massive and unwieldy number 
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of hits. To reduce the complexity of this gene 
set and to remove redundant information, Qin 
et al. created metagenomic linkage groups 
(MLGs) of genes that showed similar profiles 
in either the patient or the control samples, 
in terms of taxonomic assignment or rela- 
tive abundance of the markers (Fig. 1). This 
approach addressed two major problems with 
current metagenomic analyses. First, analyses 
so far have depended on comparisons with 
available microbial reference sequences to 
assess the species and functional composition 
of the metagenome. Such studies are limited 
by the fact that many detected sequences do 
not map to these references, so uncharacter- 
ized or rare microbes may go undetected. 
But by using MLGs, rare gene regions can 
be associated with a larger MLG, and there- 
fore taken into account. Second, associations 
based on taxonomy alone can be misleading, 
as gene transfer between species or variation in 
gene content within a species can complicate 
taxonomic assignments. 

One of the associations with type 2 diabetes 
identified by Qin and colleagues was a deple- 
tion of butyrate-producing bacteria. Butyrate 
isa molecule used as an energy source by cells 
in the gut lining, but it has also been suggested 
to reduce inflammatory responses in the 
colon*.Consequently, a decrease in butyrate 
may contribute to the increased intestinal 
inflammation implicated in insulin resist- 
ance’ and type 2 diabetes. Qin et al. also saw 
an increase in colonization by opportunistic 
pathogens and an enrichment in microbial 
genes related to oxidative-stress resistance in 
people with the disease; both of these changes 
may further contribute to the inflammatory 
environment. 

However, before the results of this study can 
be accepted as providing generalizable risk 
factors for type 2 diabetes, further investiga- 
tions will be necessary. First, obesity was not 
a factor in this study, which — given the link 
between type 2 diabetes and obesity — limits 
its applicability. Similarly, diet, which is known 
to have a strong influence on the composition 
of the gut microbiota’, must be considered in 
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Figure 1 | Mining the metagenome. Qin et al.' studied the metagenome of the human gut — the 
combined genomes of the trillions of its resident microorganisms — in the search for microbial species 
and gene functions that are associated with type 2 diabetes. First, they identified microbial gene markers 
and determined the abundance of these markers in stool samples from a group of patients with type 2 
diabetes and from a control group of people without the disease. The authors then clustered these markers 
according to their relative abundance, and assigned markers that had similar abundance profiles within 
either the patient or the control group to a metagenomic linkage group (MLG). In this example, MLG1 is 
over-represented in the patient group, MLG?2 is similarly represented in patient and control groups and 
MLG3 is over-represented in the control group. The use of MLGs reduced the number of associations for 
subsequent analysis, minimized redundancy and allowed rare gene markers to be taken into account. 


future research. Butyrate is a by-product of 
the metabolism of plant-based foods, but this 
raises a ‘chicken-and-egg’ question of whether 
individuals with type 2 diabetes have a diet 
lacking in these foods to begin with, leading 
to differences in the composition of their gut 
microbiomes, or whether it is the microbiota 
that contributes to disease development. To 
clarify some of these questions, it would be 
of great interest for future investigations to 
cross ethnic lines and to follow patients on 
controlled diets over extended periods. Fur- 
thermore, it is important to note that this study 
was associative rather than causal, and that 
controlled prospective studies are needed to 
better distinguish a metagenomic contribution 
from potentially confounding factors such as 
diet and obesity. 

Beyond its contribution to our understand- 
ing of type 2 diabetes, this study introduced 
a novel approach — the application of tradi- 
tional elements of GWAS to metagenomic 
data, and the introduction of MLGs to analyse 
the association of metagenomic variants with 


a disease. This method will be applicable to a 
wide range of diseases or human traits that are 
likely to be influenced by the microbiome. The 
potential interactions between human genetic 
variation and metagenomic variation are stag- 
geringly complex, and we are only beginning 
to understand how the human genotype influ- 
ences the characteristics of its resident micro- 
biota, and how, in turn, the microbiome can 
influence human health. 

Finally, although MGWASs have great 
potential to account for some of the ‘missing 
heritability’’ in diseases such as type 2 diabe- 
tes, this same concept also leads to questions 
about whether the microbiome, and its effect 
on human traits, is itself heritable. Infants 
inherit, or converge on, similar microbiota 
to that of their parents, through breast milk, 
physical contact and shared living. But the 
gut microbiome is also amazingly plastic, 
responding rapidly to such factors as dietary 
changes and antibiotic treatment. And so our 
growing understanding of the metagenome 
may in some cases further blur, rather than 
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define, the boundaries between genetic and 
environmental influence on human traits 
and disease. m 
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Two black holes found 
in a star cluster 


The detection of two candidate black holes in a dense system of stars in the 
Milky Way suggests that a larger population of such objects might be lurking 


in this system. SEE LETTER P.71 


STEFAN UMBREIT 


he Milky Way globular cluster M22, 

a massive, dense star cluster contain- 

ing close to a million stars, is starting 
to reveal its secrets. On page 71 of this issue, 
Strader et al.' present long-exposure radio 
images of M22, obtained by the Karl G. Jan- 
sky Very Large Array of radio antennas, 
showing two previously unknown sources of 
radio waves. The sources are candidates for 
‘stellar-mass black holes, with masses 10-20 
times that of the Sun. These are the first pos- 
sible black holes to be detected in massive, 
old Milky Way star clusters such 
as M22. Futhermore, by virtue of 
the fact that two black holes were 
found, and were found in the clus- 
ter centre, the discovery provides 
insight into the dynamic evolution 
of the black-hole population. 

Although the existence of black 
holes was predicted by theory 
almost a century ago’, it was only 
in the past three decades that 
observational evidence became 
available. Because black holes are 
dark, the only way to detect them 
is through their gravitational influ- 
ence on surrounding matter. In 
most cases, this requires that the 
black hole is in a binary system 
with another object that is either 
transferring mass to the black hole 
(accreting), at the same time emit- 
ting a large amount of energy as 
X-rays, or that it is bright enough 
to allow measurements of its radial 
velocity to be made. 

In the present study, however, 
the main evidence for the inter- 
pretation of the two new sources 
as black holes comes from the 
relationship between the observed 


46 


radio emission and the X-ray emission. 
The radio emission from a black hole is usually 
attributed to radiation from jets of gas emanat- 
ing from either side of a disk of gas, which is 
accreted from a companion star (Fig. 1). The 
X-ray emission results from strong shear in 
the inner part of the disk and from gas turbu- 
lence, which together heat the gas to such high 
temperatures that it releases X-rays. 
Although many of the details of the con- 
nection between the gas jet and disk remain 
unclear, observations® of accreting, stellar- 
mass black holes led to the determination of 
a relationship between the X-ray and radio 


Jet 


Black hole 


Figure 1 | Artist’s impression of an accreting black hole. Strader et al.' 
detected two stellar-mass black-hole candidates in a Milky Way globular cluster 
by analysing the systems’ radio and X-ray emissions. The radio emission is 
associated with a jet that emanates from both sides of a disk of gas, which is 
attracted to the black hole from a companion star. The X-ray emission is linked 
to the inner part of the disk. 
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emissions, for low accretion rates. Thus, as 
X-ray luminosity decreases, the radio emis- 
sion becomes increasingly dominant. Perhaps 
most importantly, observations’ as well as the- 
oretical studies’ show that the radio-to-X-ray 
emission ratio increases with black-hole mass. 
This makes radio observations ideally suited to 
detect stellar-mass black holes®. 

Strader et al. argue that the fact that the 
two new radio sources were not detected by 
the Chandra X-ray satellite places an upper 
limit on the sources’ X-ray luminosity. When 
combined with their radio emission, this limit 
yields a large minimum radio-to-X-ray ratio, 
corresponding to stellar-mass black holes 
each with 10-20 solar masses. The interpre- 
tation of the sources as two black holes is 
compelling, not least because further sup- 
port comes from, among other independent 
evidence, their location close to the cluster 
centre. In a ‘self-gravitating’ stellar system (in 
which all the individual components are held 
under the combined gravity of the object as a 
whole) which is in thermal equilibrium, the 
average distance of an object from the centre 
is a function of its mass, with more-massive 
objects lying farther in. Given that the core of 
M22 would have reached approximate ther- 
mal equilibrium in a relatively 
short time (0.3 billion years) 
compared with the cluster’s age 
(12 billion years), the positions 
of the two sources can be used to 
determine the masses of the black 
holes. Using this approach, the 
authors deduced that the black 
holes are about 15 times more 
massive than the Sun — which is 
in accord with their previous cal- 
culation of 10-20 solar masses. 

The detection of two accret- 
ing black holes in a globular 
cluster has implications for our 
understanding of the structure 
and dynamic evolution of dense 
stellar systems. First, there may 
be more than two black holes in 
M22, either as single entities or 
in binary systems in which mass 
accretion does not occur. Theo- 
retical calculations’ of the forma- 
tion of binary systems composed 
ofa black hole and a white-dwarf 
star in globular clusters indicate 
that, over a period of 10 billion 
years, only 2-40% of the total 
retained black-hole population 
forms black-hole-white-dwarf 
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binaries that have observable gas accretion. 
Therefore, if the sources detected in M22 
were in binaries with white dwarfs, M22 could 
contain as many as 100 black holes. Second, 
simulations of clusters that have a large popu- 
lation of stellar-mass black holes showed’ that 
the black-hole population leads to a substantial 
expansion of the cluster core. This is mainly 
caused by frequent ejections of the black holes 
from the core into the outer cluster regions by 
close gravitational encounters with other black 
holes. Strader et al. suggest that this expansion 
could explain why M22 has the fifth-largest 
core radius among bright Milky Way globular 
clusters. 

Finally, and perhaps most significantly, 
Strader et al. point out that the discovery of 
two stellar-mass black holes in M22 challenges 
ahypothesis that has been held for decades — 
that the population of black holes disappears 
rapidly through gravitational interactions, 
such that only one, or a binary system of two, 
black holes remain at the typical age of globular 
clusters (107° years)”. This would be all the 
more important if the black holes detected in 
M22 were part ofa large population; Strader 
and colleagues’ results indicate that more black 
holes may be coupled to the cluster. 

If many black holes are retained in globular 
clusters, we would expect an increased detec- 
tion of gravitational waves from the merging 
of black-hole binaries. Although the inter- 
action rate between black holes, and conse- 
quently the formation of black-hole-black-hole 
binaries, would be larger if the black holes were 
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decoupled from the cluster, the rate of 
destruction of potential black-hole binaries 
would likewise increase, leading overall to 
fewer merger events’”. Future gravitational- 
wave searches will allow this expectation to 
be tested, and the full impact of Strader and 
colleagues’ findings to be revealed. = 
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Tolerating pregnancy 


The activity of specific suppressive immune cells, some of which persist to aid 
subsequent pregnancies, helps to explain how a pregnant female’s immune 
system tolerates fetal antigens inherited from the father. SEE LETTER P.102 


ALEXANDER G. BETZ 


immune systems of placental mammals. 

A pregnant female’s immune system 
has to defend both mother and fetus from 
pathogens, while at the same time tolerating 
the fetus, which contains antigens that the 
maternal immune system recognizes as foreign 
because they are the products of genes inher- 
ited from the father. On page 102 of this issue, 
Rowe et al.' demonstrate that, during preg- 
nancy, immune cells called regulatory T cells 
that recognize these paternal antigens prolifer- 
ate in the mother and specifically suppress the 
maternal immune response against the fetus. 
Furthermore, the authors show that a pool of 
these cells remains long after delivery, facili- 


Pismnnes poses a conundrum for the 


tating tolerance in subsequent pregnancies. 
The description of this mechanism may, in 
the future, help to develop treatments for pre- 
eclampsia and prevent miscarriages resulting 
from immune rejection of the fetus’. 
Genetically, a fetus is half mother, half father. 
From an evolutionary perspective, maternal 
exposure to paternal antigens in the fetus is a 
relatively new problem: most animals lay eggs, 
so tolerance is not an issue. Yet physical attach- 
ment of the developing mammalian fetus to 
the mother’s uterine wall by the placenta pro- 
vides clear benefits — it allows gas exchange, 
nutrient uptake and waste disposal through the 
mother’s blood circulation, providing optimal 
conditions for the growth of the developing 
fetus. A systemic immune suppression to facili- 
tate this fetal ‘implantation’ would be much too 
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50 Years Ago 


La Vie par Jean Rostand et Andrée 
Tétry — Anyone glancing through 
this comprehensive, abundantly 
illustrated and well-produced 
volume could be forgiven for 
thinking life was very odd. Scarcely 
a page is passed that does not 
contain a picture of the abnormal, 
from the slightly unusual to the 
chimerical ... It begins with 
morality and religion, passes to 
the consideration of the living cell 
and the latest knowledge on this 
important topic ... and ends with 
some speculations on the future of 
the human race. One can but envy 
the authors their erudition — or 
the amount they must have learned 
in compiling the volume — and 
congratulate them on the fruits of 
their labours. 

From Nature 6 October 1962 


100 Years Ago 


Satisfied with the maintenance of 

a specious standard of chemical 
purity, the public has acquiesced in 
the elevation of sky-scrapers and 
the sinking cavernous places of 
business. Many have thus become 
cave-dwellers, confined for most 
of their waking and sleeping hours 
in windless places, artificially lit, 
monotonously warmed. The sun 

is cut off by the shadow of tall 
buildings and by smoke — the sun, 
the energiser of the world, the giver 
ofall things which bring joy to the 
heart of man, the fitting object 

of worship of our forefathers ... 
Modern civilisation has withdrawn 
many of us from the struggle with 
the rigours of nature ... [maintain 
that the bracing effect of cold is of 
supreme importance to health and 
happiness, that we become soft 
and flabby and less resistant to the 
attacks of infecting bacteria in the 
winter, not because of the cold, but 
because of our excessive precautions 
to preserve ourselves from cold. 
From Nature 3 October 1912 
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Figure 1 | Immune memory for a father’s antigens. Rowe and colleagues’ 
show that the mouse immune response to a foreign antigen differs 

radically depending on the context. a, In virgin mice, antigen-specific 
helper T cells proliferate in response to infection with Listeria bacteria 
expressing the antigen, despite the presence of a small number of regulatory 
T cells with matching antigen specificity (regulatory T cells are suppressive 
immune cells that dampen the responses of other immune cells). b, By 
contrast, when the same antigen is presented by the fetus during 


risky because it would expose the mother and 
developing offspring to infection. So placental 
animals had to evolve a mechanism for local- 
ized and specific immune suppression. 

Rowe and colleagues have helped to clarify 
this mechanism. It was previously known that 
the mother’s immune system tolerates the fetus 
despite being fully aware of it* and that a class 
of immune cell called regulatory T cells has 
akey role in this process’. Regulatory T cells 
act as suppressors of immune responses’; 
they are best known for their role in the pre- 
vention of autoimmune responses, but they 
are also involved in other functions, such as 
shutting down immune responses after suc- 
cessful elimination of a pathogen. These cells 
differentiate from precursor T cells in response 
to expression of just a single protein, Foxp3 
(ref. 5), either in the thymus during T-cell 
development (thymic regulatory T cells) or in 
peripheral immune organs, such as the spleen 
or lymph nodes, during immune responses 
(induced regulatory T cells). 

To assess how regulatory T cells promote 
immune tolerance of a fetus, Rowe et al. stud- 
ied the fate of antigen-specific T cells respond- 
ing to the antigen when encountering it either 
as a pathogen antigen or asa paternally derived 
antigen during pregnancy. When the authors 
infected mice with Listeria bacteria express- 
ing the antigen, they observed antigen-specific 
‘helper’ T cells proliferating — the expected 
immune response to a foreign antigen. But 
the immune response they saw in pregnant 
mice when the same antigen was expressed 
by the fetus was characterized by a substantial 
expansion in the number of maternal antigen- 
specific regulatory T cells (Fig. 1). This was the 
result of both proliferation of antigen-specific 
thymic regulatory T cells, and the conversion 
of antigen-specific helper T cells in peripheral 


b First pregnancy (paternal antigen) 


Induction 


c¢ Post-partum 


d Second pregnancy 


© oe Le © 
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Proliferation 


fetal antigen (d). 


organs to regulatory T cells through the 
induction of Foxp3. The authors show that 
this immune suppression is highly antigen- 
specific, which explains why the pregnant 
female’s ability to launch immune responses 
against infections is not affected. 

These results demonstrate that the context 
of pregnancy can determine whether a foreign 
antigen is attacked or tolerated. The crucial 
role of regulatory T cells in this process pro- 
vokes the question of whether these cells 
might have influenced the evolution of pla- 
cental implantation of the fetus. Two recent 
comparative-genomics studies” provide some 
clues. Although a Foxp3-like gene is present in 
fish, the protein it encodes lacks key domains 
required for the commitment of T cells to the 
regulatory lineage and for their function’. 
Intriguingly, it seems that Foxp3 was lost from 
the genome of birds, but retained in mammals, 
in which it acquired additional functional 
domains® and additional elements that regu- 
late its expression’. Foxp3 in monotremes (egg- 
laying mammals) contains most, if not all, of 
the functional domains®, which indicates that 
regulatory T cells evolved before the evolution 
of fetal implantation. Thus, it is tempting to 
speculate that the evolutionary gain of these 
cells facilitated the evolution of invasive pla- 
centation, because it provided the maternal 
immune system with a mechanism to tolerate 
the fetus without unduly compromising its 
responsiveness to pathogens. 

Intriguingly, Rowe et al. also show that the 
number of fetal-antigen-specific regulatory 
T cells persists at elevated levels long after 
delivery of the baby, and that the cells rapidly 
proliferate during subsequent pregnancies 
(Fig. 1). This is reminiscent of the regulatory 
T-cell ‘memory’ that is seen for some self-anti- 
gens and that helps to prevent autoimmunity’. 
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Rapid 


pregnancy, antigen-specific regulatory T cells accumulate, as a result 

of both proliferation of existing regulatory T cells and the differentiation 
of regulatory T cells from helper T cells (induced regulatory T cells). 

c, d, Rowe and colleagues also show that a memory pool of these antigen- 
specific regulatory T cells persists long after delivery (c) and that, in 
subsequent pregnancies with the same father, these memory regulatory 
T cells rapidly proliferate and provide immune tolerance to the same 


These fetus-specific memory cells might 
explain why pre-eclampsia (a condition asso- 
ciated with impaired maternal immune toler- 
ance of the fetus) is predominantly a disease 
of the first pregnancy unless there is a change 
in father. 

The generation of regulatory T-cell memory 
towards fetal antigens is part of a broader 
change in the immunological status quo 
that is triggered by pregnancy. For example, 
some autoimmune diseases, such as arthritis, 
are temporarily ameliorated, and regulatory 
T cells have been shown to be responsible for 
this beneficial effect’. Unfortunately, the dis- 
eases return with a vengeance after delivery 
and do not seem to benefit from the genera- 
tion of protective regulatory T-cell memory. 
However, studies building on Rowe and 
colleagues’ report of regulatory T-cell persis- 
tence following pregnancy may in the future 
help to harness elements of this process for the 
treatment of autoimmune disease. = 
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The oyster genome reveals stress 
adaptation and complexity of 
Shell formation 


Guofan Zhang", Xiaodong Fang”, Ximing Guo**, Li Li’*, Ruibang Luo?**, Fei Xu!*, Pengcheng Yang”*, Linlin Zhang", 

Xiaotong Wang! . » Haigang Qi’, Zhigiang Xiong’, Huayong, Que’, Yinlong Xie24, 4 Peter W. H. Holland”, Jordi Paps”, Yabing Zhu’, 
Fucun Wu! , Yuanxin Chen’, Jiafeng Wang’, Chunfang Peng”, Jie Meng', Lan Yang?, Jun Liu!, Bo Wen?, Na Zhang", Zhiyong Huang? ; 
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The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the 
Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling 
strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster 
genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively 
shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The 
expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster’s 
adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs 
is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster 


genome sequence fills a void in our understanding of the Lophotrochozoa. 


Oceans cover approximately 71% of the Earth’s surface and harbour 
most of the phylum diversity of the animal kingdom. Understanding 
marine biodiversity and its evolution remains a major challenge. The 
Pacific oyster C. gigas (Thunberg, 1793) is a marine bivalve belonging to 
the phylum Mollusca, which contains the largest number of described 
marine animal species’. Molluscs have vital roles in the functioning of 
marine, freshwater and terrestrial ecosystems, and have had major 
effects on humans, primarily as food sources but also as sources of dyes, 
decorative pearls and shells, vectors of parasites, and biofouling or 
destructive agents. Many molluscs are important fishery and aquacul- 
ture species, as well as models for studying neurobiology, biomineraliza- 
tion, ocean acidification and adaptation to coastal environments under 
climate change*’. As the most speciose member of the Lophotrochozoa, 
phylum Mollusca is central to our understanding of the biology and 
evolution of this superphylum of protostomes. 

As sessile marine animals living in estuarine and intertidal regions, 
oysters must cope with harsh and dynamically changing environments. 
Abiotic factors such as temperature and salinity fluctuate wildly, and 
toxic metals and desiccation also pose serious challenges. Filter-feeding 
oysters face tremendous exposure to microbial pathogens. Oysters do 
have a notable physical line of defence against predation and desic- 


cation in the formation of thick calcified shells, a key evolutionary 
innovation making molluscs a successful group. However, acidification 
of the world’s oceans by uptake of anthropogenic carbon dioxide poses a 
potentially serious threat to this ancient adaptation*. Understanding 
biomineralization and molluscan shell formation is, thus, a major 
area of interest’. Crassostrea gigas is also an interesting model for 
developmental biology owing to its mosaic development with typical 
molluscan stages, including trochophore and veliger larvae and 
metamorphosis. 

A complete genome sequence of C. gigas would enable a more 
thorough understanding of oyster biology and the evolution of 
Lophotrochozoa. One of the main challenges, however, is the high 
levels of polymorphism present in oysters and many marine inverte- 
brates® *. To overcome this, an oyster derived from four generations of 
full-sibling mating (coefficient of inbreeding, F = 0.59) was used for 
genome sequencing and assembly (Supplementary Text B1) through 
fosmid pooling, next-generation sequencing (NGS) and hierarchical 
assembling. Combining these genomic data with transcriptomes from 
different organs, different developmental stages and adults challenged 
with stressors, in addition to mass spectrometric analysis of shell 
proteins, allowed us to explore characteristics of the oyster genome 
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and key aspects of molluscan biology related to stress response and 
shell formation. 


Sequencing and hierarchical assembly 

NGS technology has been successfully applied for de novo genome 
sequencing and assembly using whole-genome shotgun strategies””*. 
We initially generated 155-fold Illumina whole-genome shotgun 
reads (Supplementary Table 1), but could not adequately assemble 
them owing to high levels of polymorphism and abundant repetitive 
sequences (Supplementary Text B2 and Supplementary Fig. 1). As 
possible alternative sequencing strategies—such as the addition of 
longer Roche 454 reads'*” or traditional bacterial artificial chromosome 
(BAC)-to-BAC sequencing—are expensive, we opted instead for a more 
cost-effective fosmid-pooling strategy. In brief, a fosmid library was con- 
structed, and 145,170 clones (~tenfold genome coverage) were evenly 
and randomly assigned into 1,613 pools, each of which was sequenced to 
~60-fold depth and assembled separately (Fig. 1 and Supplementary 
Table 1). Contigs from each pool were merged into supercontigs, totalling 
1,002 megabases (Mb) (Supplementary Text B4.1-3), which was larger 
than genome-size estimates of 637 Mb from flow cytometry or 545 Mb 
from k-mer (k-base fragment) analysis (Supplementary Text B1, 2.3), 
owing to failure of some allelic variants to merge (Supplementary Figs 
3 and 4). Self-to-self whole-genome alignment with LASTZ“ and sequen- 
cing depth information were used to remove redundancy in the assembly 
(Supplementary Text B4.4). The resulting 446 Mb of the assembly were 
retained for further scaffolding using paired-end data (Fig. 1). The final 
assembly comprised 559 Mb, with a contig N50 size (at which 50% of 
assembly was covered) of 19.4 kilobases (kb) and a scaffold N50 size of 
401 kb (Supplementary Text B4.5 and Supplementary Table 3). Over 90% 
of the assembly was covered by the longest 1,670 (14%) scaffolds. 

To assess the completeness of the assembly, 105-fold coverage of 
short-insert library reads (<2kb) that participated in assembly 
(Supplementary Table 1) were aligned against the assembly. Over 
99% of these reads were successfully mapped, using a combination 
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Figure 1 | Fosmid-pooling strategy for oyster genome assembly. Genomic 
DNA was randomly sheared into fragments. a, b, A 40-kb-insert fosmid library 
was constructed (a), and 145,170 fosmid clones were randomly selected and 
assigned into 1,613 pools, each containing 90 clones covering 0.57% of the 
diploid genome (b). For each pool, three Illumina short-insert barcoded 
libraries (two 200 bp and one 500 bp) were constructed and ~60-fold coverage 
of 90-bp reads (20-fold per library) were generated, and assembled using 
SOAPdenovo with optimizing parameters. Assemblies from each pool were 
further corrected and reassembled if unexpected connections were detected 
owing to high similarity sequences from different fosmids, and gaps were filled 
by local assembly. c, Fosmid scaffolds were split into contigs at unfilled regions, 
leaving no undetermined bases in the sequences. Each base was assigned a 
Phred-like quality score determined by its coverage and alignment mismatches, 
and these sequences were merged into supercontigs using the overlap layout 
consensus method. Redundancy was removed using self-to-self alignment and 
sequencing depth information. d, Whole-genome shotgun Illumina libraries 
(200-bp to 20-kb inserts) from sheared genomic DNA were constructed for 
mated-pair Illumina sequencing. e, The fosmid supercontigs were linked into 
scaffolds using (1) the whole-genome shotgun sequences; (2) inferred paired- 
end information extracted from assembled pool scaffolds with a span size 
ranging from 50 bp to 37.5 kb; and (3) 225,000 fosmid ends sequenced using 
Sanger technology. 
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of Burrows-Wheeler Aligners'* and the more sensitive LASTZ 
(Supplementary Fig. 5 and Supplementary Table 4). The integrity of 
the assembly is further demonstrated by the successful mapping of 99% 
of the BAC sequenced obtained using the Sanger sequencing technique, 
and 98% of ~68,000 expressed sequence tags from 454 sequencing 
(Supplementary Text B5, Supplementary Fig. 6 and Supplementary 
Tables 5 and 6). Fosmid pooling has been used for re-sequencing'*”’, 
and our results show that the combination of fosmid pooling, NGS and 
hierarchical assembly provides a new, cost-effective alternative for de 
novo sequencing and assembly of complex genomes. 


Polymorphism and repetitive sequences 


To understand polymorphism in the oyster genome, we analysed 
allelic variation in the assembled genome (inbred) and one re- 
sequenced wild oyster (wild) (Supplementary Text C1). The inbred 
genome contained 3.1 million single-nucleotide polymorphisms and 
258,405 short insertion/deletion (indels, 1-40 base pairs (bp)) yield- 
ing a sequence polymorphism rate of 0.73%, whereas the wild genome 
had 3.8 million single-nucleotide polymorphisms and 238,182 indels, 
ora polymorphism rate of 1.3% (Supplementary Table 7), comparable to 
previous estimates’*. This 44% reduction in polymorphism in the inbred 
genome is smaller than the 59.4% predicted from four generations of 
brother-sister mating, indicating that selection favouring heterozygotes 
had occurred’”. The polymorphism combining inbred and wild (among 
four haplotypes) was 2.3%, higher than that in most studied animal 
genomes”! but comparable to that in known high-polymorphism 
species’. In inbred and wild, we found 3,094 short indels located in 
coding regions inferred to cause frameshift variants in 2,665 genes, 
providing an important source for recessive lethal mutations. 
k-mer-based analysis of the oyster genome showed that ~35% of 
17-mers had at least two identical copies in the genome, suggesting an 
abundance of repetitive sequences (Supplementary Fig. 1). Similarly, 
homology searching and ab initio prediction found 202 Mb (36% of 
the genome) in repetitive sequences (Supplementary Text C2 and 
Supplementary Table 8). Over 62% of the detected repeats could 
not be assigned to known categories, reflecting the paucity of genomic 
information from molluscs”. Large numbers of transposase (359) and 
reverse-transcriptase (779) gene fragments were detected; over 96% of 
these had detectable transcripts (Supplementary Fig. 8). Alignment of 
the wild sequence against the assembly identified 20,605 deletions 
(>100 bp), over 80% of which overlapped with detected transposable 
elements, suggesting that transposable elements may have an active 
role in shaping genome variation. Using MITE-hunter”’, we detected 
157,007 copies of miniature inverted-repeat transposable elements 
(MITEs), accounting for a remarkable 8.82% of the genome (Sup- 
plementary Text C2.3 and Supplementary Table 9). Pair-wise com- 
parisons show extremely low sequence divergence in some MITE families 
(Supplementary Fig. 9), indicating that they may still be active. 


Gene annotation and developmental genomics 


A total of 28,027 genes were predicted encoding 50 amino acids or more 
by combining de novo prediction and evidence-based searches using 
reference genomes, oyster expressed sequence tags and transcriptomes 
from multiple organs and developmental stages (Supplementary Text 
D1 and El and Supplementary Fig. 11), with 96.1% showing expression 
(reads per kb per million mapped reads (RPKM) > 1 in at least one 
transcriptome; Supplementary Text D2). Of the inferred proteins, 
21,085 matched entries in the SWISS-PROT, InterPro or TrEMBL 
databases. These genes plus their transcriptome profile from 7 adult 
organs and at 38 developmental stages provide valuable resources for 
comparative genomics analysis (Supplementary Text E2 and 3), func- 
tional inference and studies on development and organogenesis 
(Supplementary Text F2 and Supplementary Fig. 15). 

One notable finding of developmental interest is that the oyster 
Hox gene cluster is broken into four sections (Fig. 2) with flanking 
non-Hox genes (Supplementary Fig. 16). We did not find a clear 
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Figure 2 | Clustering of Hox genes in Pacific oyster Crassostrea gigas, 
polychaete annelid Capitella teleta, fruitfly Drosophila melanogaster, 
lancelet Branchiostoma floridae and Homo sapiens. Oblique lines indicate 


Antennapedia gene, which is present in other bivalves such as Pecten 
and Yoldia** (Supplementary Fig. 17). Disruption of the Hox cluster, 
as also observed in tunicates, nematodes and drosophilids, has been 
attributed to the loss of temporal co-linearity and modified devel- 
opmental control**. Supporting this model, we find that Hox genes 
in the oyster are not activated in an order matching their identity or 
genomic position, with, for example, HOX4 and HOX1 peaking before 
gastrulation, LOX5 and POST2 during the trochophore stage and 
HOXS5 during the pediveliger stage (Supplementary Fig. 18 and 
Supplementary Table 15). 


Adaptation to environmental stress 

Comparison with seven other sequenced genomes identified 8,654 
oyster-specific genes (Supplementary Text E3.1) that are probably 
important in the evolution and adaptation of oysters and other 
molluscs. With oysters being the only representative, these genes 
could be shared by other molluscs. Among these genes, gene ontology 
terms related to ‘protein binding’, ‘apoptosis’, ‘cytokine activity’ 
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Figure 3 | Expansion, expression and pathway distribution of defence- 
related genes in Crassostrea gigas. a, Expansion and expression of key genes 
in major stress-response pathways in C. gigas. Genes include HSPs and HSF in 
the heat-shock response; GRP78, CRT, CNX, GRP94, PERK, IRE1 and EIF2a in 
the endoplasmic reticulum unfolded-protein response (UPR™’); IAPs, BCL2 
like, BAG, BI1, caspases, FADD and TNFR in apoptotic pathways; CYP450 and 
MO in oxidation; and SOD, GPX, PRX and CAT in anti-oxidation. Boxes with 
bold black borders indicate gene families (HSPs, [APs and SODs) expanded in 
C. gigas, and the filled colours correspond to their degree of upregulation in 
RPKMyreatment/ RPKM.ontroi by stress, found in 61 transcriptomes from oysters 
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regions of Hox cluster that are non-contiguous or interrupted. Blue denotes 
anterior Hox genes, yellow denotes paralogy group 3 Hox genes, green and 
purple denote central Hox genes and red denotes posterior Hox genes. 


and ‘inflammatory response’ are highly enriched (P< 0.0001; 
Supplementary Text E2 and Supplementary Table 17), indicating 
over-representation of some host-defence genes against biotic and 
abiotic stress. Manual examination shows that several gene families 
related to defence pathways, including protein folding, oxidation and 
anti-oxidation, apoptosis and immune responses, are expanded in C. 
gigas (Fig. 3a and Supplementary Table 18). The oyster genome con- 
tains 88 heat shock protein 70 (HSP70) genes, which have crucial roles 
in protecting cells against heat and other stresses, compared with ~17 
in humans and 39 in sea urchins. Phylogenetic analysis finds cluster- 
ing of 71 oyster HSP70 genes to themselves, suggesting that the expan- 
sion is specific to the oyster (Supplementary Fig. 19). Also expanded 
are cytochrome P450 (Supplementary Fig. 20) and multi-copper 
oxidase gene families, which are important in the biotransformation 
of endobiotic and xenobiotic chemicals”, and extracellular superoxide 
dismutases, which are important in defence against oxidative stress. 
The oyster genome has 48 genes coding for inhibitor of apoptosis 
proteins (IAPs), compared with 8 in humans and 7 in sea urchins, 
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challenged with 9 types of stressors (Supplementary Text G2 and 
Supplementary Table 23). b, Venn diagram of common and unique genes 
expressed in response to temperature, salinity, air exposure and heavy-metal 
stress (zinc, cadmium, copper, lead and mercury), showing overlap of 
responses. c, Number of genes with and without detectable paralogues 
differentially expressed under stress and normal conditions, showing that genes 
responding to stress are more likely to have paralogues (P< 110 '°; 7’ test). 
Green sections of the pie chart represent 1,442, 809, 358, 550 and 7,938 
paralogues for air exposure, metal, temperature, salinity and normal 
conditions, respectively. 


4 OCTOBER 2012 | VOL 490 | NATURE | 51 


©2012 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


indicating a powerful anti-apoptosis system in oysters. Genes encoding 
lectin-like proteins, including C-type lectin, fibrinogen-related 
proteins and Clq domain-containing proteins (C1QDCs), are highly 
over-represented in the oyster genome (P < 0.0001; Supplementary 
Table 18); these genes have important roles in the innate immune 
response in invertebrates’”’’. Interestingly, many immune-related 
genes, including genes coding for Gram-negative bacteria-binding 
proteins, peptidoglycan-recognition proteins, defensin, C-type- 
lectin-domain-containing proteins and C1QDCs, are highly expressed 
in the digestive gland (Supplementary Fig. 21), indicating that the 
digestive system of this filter feeder is an important first-line defence 
organ against pathogens. 

To investigate genome-wide responses to stress, we sequenced 61 
transcriptomes from C. gigas subjected to nine stressors, including 
temperature, salinity, air exposure and heavy metals (Supplementary 
Text G1 and Supplementary Tables 19 and 20). We found that 5,844 
genes were differentially expressed under at least one stressor, and 
genes responding to different stressors showed significant overlap 
(Fig. 3b and Supplementary Fig. 23a). Air exposure induced a res- 
ponse from the largest number of genes (4,420), indicating that air 
exposure is a major stressor and that oysters have evolved an extensive 
gene set in defence. Genes differentially expressed in response to stress 
are more likely to have paralogues (Fig. 3c), suggesting that expansion 
and selective retention of duplicated defence-related genes are probably 
important to oyster adaptation. Under most stressors, genes coding for 
HSPs, histones, [APs and protein biogenesis were upregulated, and 
those for protein degradation downregulated, pointing to concerted 
responses to maintain cellular homeostasis*’ (Supplementary Text G3 
and Supplementary Table 21). Genes involved in the unfolded protein 
response to cellular stress in the endoplasmic reticulum (coding for 
calreticulin, calnexin, 78- and 94-kDa glucose-regulated proteins) were 
upregulated, indicating that protein quality control is critical in cellular 
homeostasis under stress. 

Air exposure induced up to 67-fold upregulation of five highly 
expressed IAPs (Supplementary Fig. 24a). Other inhibitors of 
apoptosis were also upregulated: BCL2 up to fourfold and BAG up 
to 12-fold (Supplementary Fig. 24b). These apoptosis inhibitors were 
also highly upregulated under heat and low salinity stress. These find- 
ings, along with the expansion of IAPs, suggest that a powerful anti- 
apoptosis system exists and may be critical for the amazing endurance 
of oysters to air exposure and other stresses. The existence of an 
intrinsic apoptosis pathway in invertebrates has been controversial, 
and parts of the pathways have only recently been demonstrated for 
two lophotrochozoans*’*’. The finding of key genes belonging to both 
intrinsic (BAX, BAK, BAG, BCL2, BIJ and procaspase) and extrinsic 
(TNFR and caspase 8) apoptosis pathways indicates that oysters have 
advanced apoptosis systems. Powerful inhibition of apoptosis as shown 
by genomic and transcriptomic analyses may be central to the ability of 
oysters to tolerate prolonged air exposure and other stresses. 

Heat stress induced a ~2,000-fold increase in expression of five 
highly inducible HSP70 genes or a 13.9-fold increase in average 
expression of all HSP70 genes, amounting to 4.2% of all transcripts 
(Supplementary Figs 24c and 25). The genomic expansion and mas- 
sive upregulation of HSP genes help to explain why C. gigas can 
tolerate temperatures as high as 49°C when exposed to summer 
sun at low tide*’. HSP genes were also upregulated under other stressors 
and may be central to the oyster defence against all stresses (Sup- 
plementary Fig. 25). HSP genes may also inhibit apoptosis by binding 
to effector caspases™. 

Genes involved in signal transduction, including genes coding for 
G-protein-coupled receptors and Ras GTPase, were also activated by 
stressors (Supplementary Fig. 24f) and over-represented in the oyster 
genome (Supplementary Table 11). These regulators may have a 
role in orchestrating stress responses, which seem to be well coordi- 
nated (Fig. 3a and Supplementary Fig. 25). The expansion of key 
defence genes and the strong, complex transcriptomic response to 
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stress highlight the sophisticated genomic adaptations of the oyster 
to sessile life in a highly stressful environment. 


Shell formation 


Calcified shells provide critical protection against predation and 
desiccation in sessile marine animals such as oysters. Molluscan shells 
consist of calcium carbonate (CaCO3) crystals of either aragonite or 
calcite embedded in an elaborate organic matrix. Two models have 
been advanced for molluscan shell formation. The matrix model 
posits that mineralization occurs in a mantle-secreted matrix of 
chitin, silk fibroin and acidic proteins****. Chitin and silk proteins 
are proposed to provide matrix structure, whereas acidic proteins 
control the nucleation and growth of CaCO; crystals. The cellular 
model suggests that biomineralization is cell-mediated; that is, 
crystals are formed in haemocytes and then deposited at the miner- 
alization front’’. 

We searched the oyster genome for genes implicated in shell forma- 
tion in previous studies and examined their expression in different 
tissues and at different stages (Supplementary Text H1, 2). We also 
sequenced peptides from shells, mapped them to the genome and 
identified 259 shell proteins (Supplementary Text H3 and 
Supplementary Table 24). Although our search found evidence for 
the involvement of chitin, we did not find any silk-like proteins 
encoded in the oyster genome (Supplementary Text H2) but found, 
instead, many diverse proteins that may have roles in matrix con- 
struction and modification. Notably, a gene coding for a fibronectin- 
like protein was highly expressed at the early developmental stage, 
when larval shells are formed, in unison with chitin synthase (Fig. 4a) 
and was mostly expressed in the adult mantle (40x other organ 
average; Fig. 4b); the fibronectin-like protein was among the most 
abundant proteins found in oyster shells. Genes coding for laminin 
and some collagen proteins were also highly expressed in the mantle 
(Supplementary Fig. 27a) and found in shells. These are typical 
extracellular matrix (ECM) proteins, and their presence in shells 
suggests that the shell matrix has similarities to the ECM of animal 
connective tissues and basal lamina. Unlike silk fibroins that can self 
assemble**, the formation of fibronectin fibrils in the ECM is cell 
mediated*. Oyster fibronectin-like proteins have five type-III 
domains for integrin binding and cell adhesion. Genes coding for 
integrins were highly expressed in haemocytes (4X other organ 
average, Supplementary Fig. 27b). Thus, haemocytes may organize 
fibronectin-like fibril formation in the shell matrix as they do in ECM. 

The involvement of cells in shell formation is further supported by 
the functional diversity of proteins detected in shells. Many house- 
keeping proteins, such as elongation factor 1x and ribosomal proteins, 
were found in the shell; indeed, most oyster shell proteins are not 
structural proteins but are distributed in diverse metabolic pathways 
(Fig. 4c and Supplementary Table 25). This functional diversity of 
shell proteins mirrors that of cells, which is unexpected under the 
matrix model. Furthermore, 84% of the 259 shell proteins identified 
are not classical secreted proteins (Supplementary Text H3.4 and 
Supplementary Table 24); they may be part of cells or deposited by 
exosomes”’. Supporting the presence of exosomes, 61 of the 259 shell 
proteins matched proteins in the exosome database*’. Cells and 
exosome-like vesicles containing calcite crystals have been observed 
at the mineralization front*””, although their significance in shell 
formation is debated. This study provides molecular evidence for 
their presence inside shells and their probable participation in shell 
formation. 

Many shell proteins are enzymes that may be involved in matrix 
construction or modification. A homologue of penicillin-binding 
protein is exclusively expressed in mantle (72 other organ average) 
and highly abundant in shells (Supplementary Fig. 27d). Penicillin- 
binding protein is a transpeptidase that crosslinks glycopeptides in 
bacterial cell walls* and may have similar functions in the shell 
matrix. Another notable enzyme found is tyrosinase. The oyster 
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Figure 4 | Genes related to shell formation identified from mass 
spectroscopy analysis of shell proteins and transcriptome data. a, Relative 
expression (y axis) of genes coding for chitin synthase (gene CGI_10009438) 
and fibronectin-like (CGI_10016964) in early development corresponds to the 
formation of shell gland and first larval shells, as seen in scanning electron 
microscope photos. White arrow denotes the invagination that forms the shell 
gland. Developmental stages (x axis) and their timeline are defined in 
Supplementary Table 12. b, In adults, chitin synthase and fibronectin-like 
proteins (same colour as in a) are almost exclusively expressed in the mantle 
compared with other organs. Fibronectin-like is also one of the most abundant 
proteins found in the shell. c, Distribution of shell proteins in diverse Kyoto 
encyclopedia of genes and genomes (KEGG) pathways indicative of general 
cellular functions. d, Expression of 26 tyrosinase genes in the mantle edge, 
mantle pallial and other organs. Tyrosinases are abundant in shells and their 
higher expression in the non-pigmented mantle pallial indicate that their 
functions are not limited to melanogenesis but are related to shell formation. 


genome has an expanded set of 26 genes coding for tyrosinase, 
compared with one in Caenorhabditis elegans and two in humans; 
most genes coding for tyrosinase are mantle specific (Fig. 4d) and 
highly enriched among shell proteins (P= 8X10 °). Although 
tyrosinase is a key enzyme in melanogenesis****, it is most highly 
expressed in the non-pigmented pallial mantle (Fig. 4d), indicating 
that it has other functions in the oyster. The mantle secretes tyrosine- 
rich proteins*®, and oxidation of tyrosine may be essential for shell 
matrix maturation. Several proteinases and proteinase inhibitors are 
highly mantle specific and abundant in shells, and may be involved 
in matrix formation, modification and protection (Supplementary 
Table 24). Together, these results indicate that oyster shell matrix is 
not formed simply by self-assembling silk-like proteins but by diverse 
proteins through complex assembly and modification processes that 
may involve haemocytes and exosomes. 


Concluding remarks 

We sequenced and assembled the genome of the Pacific oyster using 
an inbred individual, short-read NGS and a new fosmid-pooling and 
hierarchical assembly strategy. The draft assembly provided insight 
into a molluscan genome characterized by high polymorphism, 
abundant repetitive sequences and active transposable elements. 
Genomic, transcriptomic and proteomic analyses show unique 
adaptations of oysters to sessile life in a highly stressful intertidal 
environment and the complexity of shell formation. The oyster 
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genome sequence and comprehensive transcriptome data provide 
valuable resources for studying molluscan biology and lophotrochozoan 
evolution, and for genetic improvement of oysters and other important 
aquaculture species. 


METHODS SUMMARY 


The sequenced Pacific oyster is an inbred female produced by four generations of 
brother-sister mating. Genome sequences were produced with Illumina platform 
using fosmid pooling and assembled with a new hierarchical assembly strategy. 
Fosmid ends were sequenced by Sanger. Gene models were obtained by integrat- 
ing results of de novo gene prediction and alignment-based methods based on 
homology and transcriptomic evidence. Transcriptomes were sequenced with 
Illumina platform. The proteome of the shell was obtained by mass spectrometry. 
All methods are described in detail in the Supplementary Information. 
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Assessment and characterization of gut microbiota has become a major research area in human disease, including type 2 
diabetes, the most prevalent endocrine disease worldwide. To carry out analysis on gut microbial content in patients 
with type 2 diabetes, we developed a protocol for a metagenome-wide association study (MGWAS) and undertook a 
two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345 Chinese individuals. We 
identified and validated approximately 60,000 type-2-diabetes-associated markers and established the concept of a 
metagenomic linkage group, enabling taxonomic species-level analyses. MGWAS analysis showed that patients with 
type 2 diabetes were characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some 
universal butyrate-producing bacteria and an increase in various opportunistic pathogens, as well as an enrichment of 
other microbial functions conferring sulphate reduction and oxidative stress resistance. An analysis of 23 additional 


individuals demonstrated that these gut microbial markers might be useful for classifying type 2 diabetes. 


Type 2 diabetes (T2D), which is a complex disorder influenced by both 
genetic and environmental components, has become a major public 
health issue throughout the world'’. Currently, research to parse the 
underlying genetic contributors to T2D is mainly through the use of 
genome-wide association studies (GWAS) focusing on identifying 
genetic components in the organism’s genome**. Recently, research 
has indicated that the risk of developing T2D may also involve factors 
from the ‘other genome’, that is, the ‘intestinal microbiome’ (also 
termed the gut metagenome)’. 

Previous metagenomic research on the gut metagenome, primarily 
using 16S ribosomal RNA® and whole-genome shotgun (WGS) 
sequencing’, has provided an overall picture of commensal microbial 
communities and their functional repertoire. For example, a catalogue 
of 3.3 million human gut microbial genes were established in 2010 
(ref. 8) and, of note, a more extensive catalogue of gut microorganisms 
and their genes were published later”'®. Recent research on the gut 
metagenome has changed our understanding of human disease and 
its potential medical impact as many studies have reported. From the 
perspective of both taxonomic and functional composition, the gut 
microbiota might be linked to and contribute to many complex 
diseases'’. For example, several studies have indicated that obesity is 
associated with an increase in the phylum Firmicutes and a relatively 
lower abundance of the phylum Bacteroidetes”'*"'°. Crohn’s disease 
research has revealed that patients had a significant reduction in 
the overall diversity of the gut microbiota’? and had changes in 


microbial composition’*, and a T2D study showed that the proportion 
of the phylum Firmicutes and the class Clostridia in the gut of patients 
was significantly reduced’®. However, more work is required to gain 
detailed information about gut microbial compositional changes and 
their associated impact with these types of diseases, and additional 
tools are required to find ways to determine associated changes easily 
and rapidly. 

To reach these initial goals, we devised and carried out a two-stage 
case-control metagenome-wide association study (MGWAS) based 
on deep next-generation shotgun sequencing of DNA extracted from 
the stool samples from a total of 345 Chinese T2D patients and non- 
diabetic controls. From this we pinpointed specific genetic and func- 
tional components of the gut metagenome associated with T2D 
(Supplementary Fig. 1). Our data provide insight into the character- 
istics of the gut metagenome related to T2D risk, a paradigm for future 
studies of the pathophysiological role of the gut metagenome in other 
relevant disorders, and the potential usefulness for a gut-microbiota- 
based approach for assessment of individuals at risk of such disorders. 


Construction of a gut metagenome reference 

To identify metagenomic markers associated with T2D, we first 
developed a comprehensive metagenome reference gene set that 
included genetic information from Chinese individuals and T2D- 
specific gut microbiota, as the currently available metagenomic ref- 
erence (the MetaHIT gene catalogue) did not include such data. We 
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carried out WGS sequencing on individual faecal DNA samples from 
145 Chinese individuals (71 cases and 74 controls, Supplementary 
Table 1) and obtained an average of 2.61 gigabases (Gb) (15.8 million) 
paired-end reads for each, totalling 378.4 Gb of high-quality data that 
was free of human DNA and adaptor contaminants (Supplementary 
Table 2). We then performed de novo assembly and metagenomic 
gene prediction for all 145 samples. We integrated these data with 
the MetaHIT gene catalogue, which contained 3.3 million genes that 
were predicted from the gut metagenomes of individuals of European 
descent, and obtained an updated gene catalogue with 4,267,985 pre- 
dicted genes. A total of 1,090,889 of these genes were uniquely 
assembled from our Chinese samples, which contributed 10.8% addi- 
tional coverage of sequencing reads when comparing our data against 
that from the MetaHIT gene catalogue alone (Supplementary Fig. 2). 

Having a more complete gene reference, we carried out taxonomic 
assignment and functional annotation for the updated gene catalogue 
using 2,890 reference genomes (IMG v3.4; Supplementary Table 3), 
KEGG (Release 59.0) and eggNOG databases (v3). Here, 21.3% of the 
genes in the updated catalogue could be robustly assigned to a genus, 
which covered 26.4%-90.6% (61.2% on average) of the sequencing 
reads in the 145 samples (Supplementary Methods); the remaining 
genes were likely to be from currently undefined microbial species. 
For assessment at a functional level, we identified 6,313 KEGG ortho- 
logues and 38,641 eggNOG orthologue groups in the updated gene 
catalogue, which covered 47.1% and 60.9%, respectively, of the genes in 
the catalogue. In addition, 14.0% of genes that were not mapped to 
eggNOG orthologue groups could be clustered into 7,042 novel gene 
families; however, these do not yet have any functional annotation 
information, but were still included (as in-house eggNOG orthologue 
groups) in our analyses. For each metagenomic sample, on average, 
48.7% and 68.8% sequencing reads were covered, respectively, by these 
KEGG orthologues- and eggNOG orthologue groups-annotated genes. 


Marker identification using a two-stage MGWAS 


To define T2D-associated metagenomic markers, we devised and 
carried out a two-stage MGWAS strategy. Using a sequence-based 
profiling method, we quantified the gut microbiota in the 145 samples 
for use in stage I. On average, with the requirement that there should 
be = 90% identity, we could uniquely map 77.4 + 0.6% (mean + s.e.m.; 
n = 145) paired-end reads to the updated gene catalogue (Supplemen- 
tary Fig. 2 and Supplementary Table 2). To normalize the sequencing 
coverage, we used relative abundance instead of the raw read count 
to quantify the gut microbial genes (Supplementary Methods). With 
nearly 16 million sequencing reads on average per sample, our 
sequence-based profiling method could reliably detect very low- 
abundance genes. For example, given a gene with a real relative abund- 
ance of 1X10 °, the detected value ranged from 0.7 x 10 ° to 
1.5 X 10 ° based on a theoretical estimation (Supplementary Fig. 3). 
To facilitate the subsequent statistical analyses at both genetic and 
functional levels, we further defined and prepared three types of 
profiles using the quantified gene results: (1) a gene profile; (2) a 
KEGG orthologues profile; and (3) an eggNOG orthologue groups 
profile (Supplementary Methods). 

We investigated the subpopulations of the 145 samples in these 
different profiles. Applying the same identification method as used 
in the MetaHIT study”’, we identified three enterotypes in our 
Chinese samples (Supplementary Figs 4 and 5). A principal component 
analysis (PCA) showed that these three enterotypes were primarily 
made up of several highly abundant genera, including Bacteroides, 
Prevotella, Bifidobacterium and Ruminococcus (Fig. 1a). However, 
we found no significant relationship between enterotype and T2D 
disease status (P = 0.29, Fisher’s exact test). We examined the top five 
principal components (P value in Tracy-Widom test < 0.05 and con- 
tribution >3%): the first and second principal components were sig- 
nificantly correlated with enterotype (P < 0.001, Kruskal-Wallis test), 
and the fifth principal component was significantly correlated with 
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Figure 1 | Identification of T2D-associated markers from gut metagenome. 
a, The T2D patients (n = 71) and controls (m = 74) from stage I were plotted on 
the first two principal components of the genus profile. Lines connect 
individuals determined to have the same enterotype (using the PAM clustering 
method of refs 20,36), and coloured circles cover the individuals near the centre 
of gravity for each cluster (<1.5c). The top four genera as the main 
contributors to these clusters were determined and plotted by their loadings in 
these two components. b, Density histogram showing the P-value distribution 
of all genes tested in stage I. The horizon line represents the distribution of P 
values under the null hypothesis. c, Density histogram showing the P-value 
distribution of genes in stage II, which were identified from stage I. The blue 
and red curves denote the estimated statistical power and false discovery rate 
(FDR), respectively, for a particular P value. 


T2D (P<0.001, Wilcoxon rank-sum test; Supplementary Fig. 5d), 
indicating that T2D, in addition to enterotype, was a determining 
factor in explaining the gut microbial differences in our samples. 
The third and fourth principal components, however, did not correlate 
with any known factors. 

We then corrected for population stratification, which might be 
related to the non-T2D-related factors. For this we analysed our data 
using a modified EIGENSTRAT method”; however, unlike what is 
done in a GWAS subpopulation correction, we applied this analysis 
to microbial abundance rather than to genotype. For gene profile, after 
adjustment, we found that the effects that correlated with non-T2D- 
related factors disappeared (Supplementary Table 4). A Wilcoxon 
rank-sum test was done on the adjusted gene profile to identify differ- 
ential metagenomic gene content between the T2D patients and con- 
trols. The outcome of our analyses showed a substantial enrichment of 
a set of microbial genes that had very small P values, as compared with 
the expected distribution under the null hypothesis (Fig. 1b), indi- 
cating that these genes were true T2D-associated gut microbial genes. 

To validate the significant associations identified in stage I, we 
carried out the stage II analysis using an additional 200 Chinese 
individuals (one of these samples had a very low within-sample 
diversity, which was probably owing to the presence of a high fraction 
of Escherichia and Klebsiella, and was therefore excluded in later 
analyses; Supplementary Tables 1 and 2). We also used WGS sequen- 
cing in stage II and generated a total of 830.8 Gb sequence data with 
23.6 million paired-end reads on average per sample. We then assessed 
the 278,167 stage I genes that had P values < 0.05 and found that the 
majority of these genes still correlated with T2D in these stage II study 
samples (Supplementary Fig. 6). We next controlled for the false 
discovery rate (FDR) in the stage II analysis, and defined a total of 
52,484 T2D-associated gene markers from these genes corresponding 
to a FDR of 2.5% (stage II P value < 0.01; Fig. 1c, Supplementary Fig. 7 
and Supplementary Table 5). 

We applied the same two-stage analysis using the KEGG orthologues 
and eggNOG orthologue groups profiles and identified a total of 1,345 
KEGG orthologues markers (stage II P < 0.05 and 4.5% FDR) and 5,612 
eggNOG orthologue groups markers (stage II P< 0.05 and 6.6% FDR) 
that were associated with T2D (Supplementary Tables 6 and 7). 


Development of a metagenomic linkage group 


To reduce and structurally organize the abundant metagenomic data 
and to enable us to make a taxonomic description, we devised the 
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generalized concept of metagenomic linkage group (MLG) in lieu ofa 
species concept for a metagenome. Here a MLG is defined as a group 
of genetic material in a metagenome that is probably physically linked 
as a unit rather than being independently distributed; this allowed us 
to avoid the need to completely determine the specific microbial 
species present in the metagenome, which is important given there 
are a large number of unknown organisms and that there is frequent 
lateral gene transfer (LGT) between bacteria. Using our gene profile, 
we defined and identified a MLG as a group of genes that co-exists 
among different individual samples and has a consistent abundance 
level and taxonomic assignment (Supplementary Methods). 

To assess the reliability of our MLG identifying method, we first 
constructed a subset of bacterial genes from the updated metagenome 
gene catalogue (m = 130,605) that were independently derived from 
50 known gut bacterial species (Supplementary Methods). We used a 
threshold for the minimum gene number for a MLG of 100, above 
which all 50 bacterial species could be identified with an average 
genome coverage of 83.0% and with an accuracy in the taxonomic 
classification of genes in the constructed subset of 99.8% (Supplemen- 
tary Fig. 8 and Supplementary Table 8). 

We identified 47 MLGs in the T2D-associated gene markers, which 
covered 84.4% of these markers (Supplementary Table 9). Of these, 17 
MLGs could be assigned to known bacterial species on the basis of 
strong alignment sequence similarity with sequenced bacterial 
genomes at the nucleotide level (Table 1). Using the taxonomic char- 
acterization from these MLGs, we found that almost all of the MLGs 
enriched in the control samples were from various butyrate- 
producing bacteria, including Clostridiales sp. SS3/4, Eubacterium 
rectale, Faecalibacterium prausnitzii, Roseburia intestinalis and 
Roseburia inulinivorans. By contrast, most of T2D-enriched MLGs 
were from opportunistic pathogens, such as Bacteroides caccae, 
Clostridium hathewayi, Clostridium ramosum, Clostridium symbiosum, 
Eggerthella lenta and Escherichia coli, which have previously been 
reported to cause or underlie human infections such as bacteraemia 
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and intra-abdominal infections”. Of interest, the known mucin- 
degrading species Akkermansia muciniphila and sulphate-reducing 
species Desulfovibrio sp. 3_1_syn3 were also enriched in T2D samples. 
The MLGs that were of unknown species origin will be of interest for 
isolation and analysis in future studies to obtain information on their 
relevant taxonomy. 

A co-occurrence network on these MLGs was generated to assess 
potential relationships between the T2D-associated gut bacteria 
(Fig. 2a and Supplementary Methods). In this result, some types of 
butyrate-producers, from clostridial cluster XIVa and IV, showed a 
positive correlation with one another and were negatively correlated 
with a group of the T2D-enriched bacteria from Clostridium, which 
may indicate an antagonistic relationship between these different 
clostridial clusters. Another interesting finding was the presence of 
asmall MLG from Haemophilus parainfluenzae, which is not a butyrate- 
producer but was significantly enriched in the control samples, even in 
an independent analysis comparing the coverage of its sequenced 
bacterial genome (the highest genome coverage in all samples was 
94.5%; P<0.001 between case and control groups, Student’s t-test). 
In the co-occurrence network, this MLG was clearly separate from the 
cluster of butyrate producers, and may have an unknown antagonistic 
relationship with a T2D-enriched bacterium that is unknown but 
appears closely related to the Subdoligranulum genus. These data 
presented various patterns indicating relationships between the 
T2D-associated gut bacteria and suggested it may be important to 
determine, in a case-by-case manner, the different roles gut bacteria 
may have in maintaining or interacting with their environment. 


Functional characterization related to T2D 


Using the T2D-associated KEGG orthologues and eggNOG ortholo- 
gue groups markers, we assessed the potential microbial functional 
roles in the gut microbiota of T2D patients. In general T2D-enriched 
markers were typically involved in the KEGG categories of membrane 
transport (P < 0.001, Fisher’s exact test). This result is consistent with 


Table 1 | The list of T2D-associated MLGs that could be assigned to previously known phylotypes 


MLG ID No. of genes P values* Odds ratios (95% Cl)+ Taxonomy assignment (level) Percentage similarityt 
Stage | Stage Il 
T2D-enriched 
T2D-154 337 0.0014 2.54 x10~4 52 (1.05, 2.19) Akkermansia muciniphila 98.2 
T2D-140 148 3.97 x104 0.0029 50 (1.15, 1.97) Bacteroides intestinalis 98.2 
T2D-139 3,386 0.0013 2.11 x10-4 .66 (1.26, 2.20) Bacteroides sp. 20_3 99:3 
T2D-11 5,113 4.16 x 10-8 7.58 x 10°° 5.89 (1.39, 25.0) Clostridium bolteae 99.4 
T2D-5 23/8 4.21 x10°° 1.97 x 10°° 23.1 (2.08, 257) Clostridium hathewayi 99.3 
T2D-80 2,381 1.30 x 10-4 1.41 x10°° .68 (0.97, 2.89) Clostridium ramosum 99.8 
T2D-57 821 4.00 x 10°’ 2.21 x10°° 2.62 (1.14, 6.03) Clostridium sp. HGF2 99.6 
T2D-15 2,492 4.74 x10°° 2.97 x10°4 .13 (0.88, 1.44) Clostridium symbiosum 99.6 
T2D-1 949 6.01 x 10-4 0.0036 Al (0.93, 2.13) Desulfovibrio sp. 3_1_syn3 98.0 
T2D-7 1,056 6.01 x10-4 2.80 x 104 57 (0.95, 2.58) Eggerthella lenta 99.6 
T2D-137 425 6.71 x 1077 0.0012 J2 (1.16, 2.57) Escherichia coli 99.0 
T2D-165 131 0.0096 0.0017 A6 (1.07, 1.99) Alistipes (genus) 99.58 
T2D-12 364 4.52 x10 8.04 x 10-8 2.22 (1.12, 4.40) Clostridium (genus) 91.0 
T2D-8 5,272 7.08 x 1071 9.95 x 10° .12 (0.86, 1.45) Clostridium (genus) 88.8 
T2D-93 1,590 2.01 x 10-4 0.0020 84 (1.03, 3.29) Parabacteroides (genus) 80.58 
T2D-62 2,584 7.63 x10 © 6.88 x 10-4 2.41 (1.43, 4.08) Subdoligranulum (genus) 98.78 
T2D-2 2,430 3.14 x10°° 0.0019 4.06 (1.28, 12.9) Lachnospiraceae (family) 97.38 
Control-enriched 
Con-107 1,677 112 «1077 0.0018 A4 (1.13, 1.84) Clostridiales sp. SS3/4 98.0 
Con-112 232 0.0064 1.99 x10~4 51 (1.13, 2.03) Eubacterium rectale 97.6 
Con-129 1,440 0.0033 0.0010 .55 (1.19, 2.00) Faecalibacterium prausnitzii 98.2 
Con-166 273 3.80 x10°° 1.94 x10~4 25 (0.93, 1.69) Haemophilus parainfluenzae 94.8 
Con-121 3,507 6.11 x10°° 4.90 x 10° 3.10:(1.92, 5.03) Roseburia intestinalis 98.9 
Con-113 345 2.85 x104 9.72 x10-4 45 (1.11, 1.89) Roseburia inulinivorans 98.2 
Con-120 116 1.90 x 10~4 541x104 .55 (1.17, 2.06) Eubacterium (genus) 89.0 
Con-130 670 0.0134 0.0018 59 (1.21, 2.08) Faecalibacterium (genus) 89.4 
Con-131 202 8.99 x104 0.0017 .58 (1.16, 2.15) Faecalibacterium (genus) 96.9 
Con-133 1,555 3.43 x107-° 0.0015 52 (1.15,2.01) Erysipelotrichaceae (family) 66.98 
Con-109 378 0.0135 1.67 x 10-4 Al (1.09, 1.83) Clostridiales (order) 87.0 


*The stage | P value was calculated after adjustment for population structures, stage II P value was one-side. 


+ Calculated by logistic model. 
¢ Similarity at nucleic acid level or, when marked with § at the protein level. 
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Figure 2 | Taxonomic and functional characterization of gut microbiota in 
T2D. a, A co-occurrence network was deduced from 47 MLGs that were 
identified from 52,484 gene markers. Nodes depict MLGs with their ID 
displayed in the centre. The size of the nodes indicates gene number within the 
MLG. The colour of the nodes indicates their taxonomic assignment. 
Connecting lines represent Spearman correlation coefficient values above 0.4 


the previous findings in studies of inflammatory bowel disease and 
obese patients*®. By contrast, control-enriched markers were fre- 
quently involved in cell motility and metabolism of cofactors and 
vitamins (P < 0.002; Supplementary Fig. 9). 

At the module or pathway level, the gut microbiota of T2D patients 
was functionally characterized with our T2D-associated markers and 
showed enrichment in membrane transport of sugars, branched-chain 
amino acid (BCAA) transport, methane metabolism, xenobiotics 
degradation and metabolism, and sulphate reduction. By contrast, 
there was a decrease in the level of bacterial chemotaxis, flagellar 
assembly, butyrate biosynthesis and metabolism of cofactors and 
vitamins (Fig. 2b and Supplementary Table 10; see Supplementary 
Fig. 10 for the detailed information on butyrate-CoA transferase). 
Some important functions, including butyrate biosynthesis and sul- 
phate reduction, coincided with the T2D-associated bacteria identified 
in the MLG analysis. The butyrate-producing bacteria seemed to be the 
primary contributors to the cell motility functions (Supplementary 
Table 11), potentially indicating some functional enrichment might 
be related to the presence of specific species enrichment. 

We found that seven of the T2D-enriched KEGG orthologues 
markers were related to oxidative stress resistance, including catalase 
(K03781), peroxiredoxin (K03386), Mn-containing catalase (K07217), 
glutathione reductase (NADPH) (K00383), nitric oxide reductase 
(K02448), putative iron-dependent peroxidase (K07223), and cyto- 
chrome c peroxidase (K00428), but none of the identified control- 
enriched KEGG orthologues markers had similar types of function. 
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(blue) or below —0.4 (red). b, A schematic diagram showing the main functions 
of the gut microbes that had a predicted T2D association. Red text denotes 
enriched functions in T2D patients; blue text denotes depleted functions in 
T2D patients; black text denotes an uncertain functional role relative to T2D. 
The dashed line arrows point to the inference that was not detected directly but 
reported by previous studies. 


This may indicate that the gut environment ofa T2D patient is one that 
stimulates bacterial defence mechanisms against oxidative stress 
(Supplementary Table 10). Similarly, we found 14 KEGG orthologues 
markers related to drug resistance that were greatly enriched in T2D 
patients, further supporting that T2D patients may have a more hostile 
gut environment, and the medical histories of these patients may reflect 
this (Supplementary Table 10). 


T2D-related dysbiosis in gut microbiota 

In light of the above MGWAS result and an additional 
PERMANOVA” (permutational multivariate analysis of variance) 
analysis that clearly showed that T2D was a significant factor for 
explaining the variation in the examined gut microbial samples 
(Supplementary Table 12), we deduced that the gut microbiota in 
T2D patients featured dysbiosis, which is a state where the balance 
of the normal microbiota has been disturbed. However, the degree of 
this T2D-related dysbiosis was moderate, because only 3.8 + 0.2% 
(mean + s.e.m.; n = 344) of the gut microbial genes (at the relative 
abundance level) were associated with T2D in an _ individual. 
Additionally, we did not observe a significant difference in the 
within-sample diversity between T2D and control groups (Fig. 3a). 
Specifically, the degree of gut microbiota change in T2D was not as 
substantial as that seen in inflammatory bowel disease (from the 
MetaHIT samples’; see Fig. 3a) or enterotypes (Supplementary Fig. 11). 
A similar result using the eggNOG orthologue groups profile sup- 
ported the same conclusion (Supplementary Fig. 12). 
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Figure 3 | Gut microbiota of T2D patients show a moderate degree of 
dysbiosis. a, An ecological comparison between T2D patients (n = 170) and 
control (1 = 174) in all samples, as well as inflammatory bowel disease (IBD) 
patients (n = 25) and control (n = 99) from published MetaHIT samples*. The 
upward bars denote the gross relative abundance of the T2D-associated gene 
markers for each sample and the same value computed on the inflammatory- 
bowel-disease-associated gene markers (see Supplementary Methods). The 
downward bars denote the within-sample diversity (calculated using the 
Shannon index) in each group. For an individual sample, a lower proportion of 
gut microbiota was implicated in T2D disease and there was no significant 
difference in the within-sample diversity between the T2D patients and control 
as compared with the distinct difference seen in the inflammatory bowel disease 
analysis. **P < 0.01; ***P < 0.001 (Student’s t-test); NS, not significant; and 
the error bar denote standard error. b, A density histogram showing a 
comparison of the occurrence rate distribution between T2D-enriched gene 
markers and control-enriched gene markers in all samples (m = 344). The 
threshold of mapped read number for gene identification is = 2. 


To characterize ecologically the gut bacteria involved in the T2D- 
related dysbiosis, we compared, in all individual samples, the distri- 
bution of the occurrence rate of both T2D-associated gene and func- 
tion markers, and these showed the same pattern, which was that the 
control-enriched markers had a higher occurrence rate on average 
than the T2D-enriched markers (Fig. 3b and Supplementary Figs 13- 
15). This may be because the beneficial bacteria lost in the T2D gut 
were universally present, whereas some of the harmful bacteria that 
appeared in the T2D gut were diverse, and thus had less overall 
abundance within the human population. 


Gut-microbiota-based T2D classification 


To exploit the potential ability of T2D classification by gut microbiota, 
we developed a T2D classifier system based on the 50 gene markers that 
we defined as an optimal gene set by a minimum redundancy—maximum 
relevance (mRMR) feature selection method (Supplementary Fig. 16 
and Supplementary Table 13). For intuitive evaluation of the risk of 
T2D disease based on these 50 gut microbial gene markers, we computed 
a T2D index (Supplementary Methods), which correlated well with the 
ratio of T2D patients in our population (Fig. 4a), and the area under the 
receiver operating characteristic (ROC) curve was 0.81 (95% confidence 
interval 0.76-0.85) (Fig. 4b), indicating the gut-microbiota-based T2D 
index could be used to classify T2D individuals accurately. 

We validated the discriminatory power of our T2D classifier using 
an independent study group: 11 T2D patients and 12 non-diabetic 
controls. In this assessment analysis, the top eight samples with the 
highest T2D index were all T2D patients (Fig. 4c and Supplementary 
Table 14); the average T2D index between case and control was sig- 
nificantly different (P = 0.004, Student’s t-test). Overall, our cross- 
sectional study in overt T2D indicated that it would be worthwhile to 
test more extensively gut-microbiota-based classifiers in future lon- 
gitudinal studies for their ability to identify subsets of the population 
that are at high risk for progressing to clinically defined T2D. 


Controls T2D patients @ 


-2 0 2 4 6 
T2D index 


Figure 4 | A trial classification of T2D using gut microbial gene markers. 
a, A classifier to identify T2D individuals was constructed using 50 gene 
markers selected by mRMR, and then, for each individual, a T2D index was 
calculated to evaluate the risk of T2D. The histogram shows the distribution of 
T2D indices for all individuals, in which values less than — 1.5 and values greater 
than 3.5 were grouped. For each bin, the black dots show the proportion of T2D 
patients in the population of that bin (y axis on the right). b, The area under the 
ROC curve (AUC) of gut-microbiota-based T2D classification. The black bars 
denote the 95% confidence interval (CI) and the area between the two outside 
curves represents the 95% CI shape. c, The T2D index was computed for an 
additional 11 Chinese T2D samples and 12 non-diabetic controls. The box 
depicts the interquartile range (IQR) between the first and third quartiles (25th 
and 75th percentiles, respectively) and the line inside denotes the median, 
whereas the points represent the T2D index in each sample. 


Discussion 

T2D is a heterogeneous and multifactorial disease, influenced by a 
number of different genetic and environmental factors. By applying 
the standard two-stage GWAS strategy to design and carry out a 
MGWAS to identify disease-associated metagenomic markers, the 
present study highlights how the gut microbial composition, 
traditionally considered to be factors of environmental origin’’, dif- 
fers between T2D patients and non-diabetic control subjects in a 
Chinese population. 

We first established an updated human microbial gene reference set, 
adding information from both a new ethnicity and from T2D patients, 
which will be a useful resource for future metagenomic analyses. We 
also developed the concept of a MLG, which provided various types of 
taxonomic information from whole-genome shotgun data, including 
bacterial species-specific regions on a chromosome, and mobile genetic 
elements, such as plasmids and bacteriophages. Thus, a MLG can 
provide metagenomic species-level information even for unknown 
species, instead of requiring traditional taxonomic classification 
approaches based on sequence composition or similarity**”’. The use 
of species-level information allows assessment of the relationships 
between the T2D-associated bacteria. For example, we identified what 
appears to be an antagonistic relationship between beneficial bacteria 
and harmful bacteria, highlighted by the large populations of clostridial 
clusters. These species-level analyses also showed various patterns: for 
example, the MLG from Haemophilus parainfluenzae in the control 
samples could be inferred, under these circumstances, to be beneficial; 
however, on the basis of relationship patterns, it was quite distinct from 
the other inferred beneficial bacteria, indicating that H. parainfluenzae 
may have a different type of impact in this specific biological context 
(Fig. 2a). 

Our findings indicated that T2D patients had only a moderate 
degree gut bacterial dysbiosis; however, functional annotation ana- 
lyses indicated a decline in butyrate-producing bacteria, which may be 
metabolically beneficial, and an increase in several opportunistic 
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pathogens. Importantly, the abundance of these categories of 
opportunistic pathogens seemed to be quite diverse among our 
Chinese study participants. Such changes in the intestinal bacteria 
composition have recently been reported for colorectal cancer 
patients” and ageing population*’. Thus, a general picture is emerging 
where butyrate-producing bacteria seem to have a protective role 
against several types of diseases. Additionally, our finding of a general 
dysbiosis in T2D patients raises the possibility that there is a ‘func- 
tional dysbiosis’, rather than there being a specific microbial species 
that has a direct association with T2D pathophysiology. Furthermore, 
given that other intestinal diseases show a loss of butyrate-producing 
bacteria with a commensurate increase in opportunistic pathogens, it 
is possible that dysbiosis that results in a disordered, rather than 
directional, alteration of gut microbial composition may itself have 
a role in increasing the susceptibility to a variety of diseases. 

Our analysis of bacterial gene functions indicating there was an 
increase in functions relating to gut oxidative stress response is also 
of interest, given that previous studies have shown that a high 
oxidative stress level is related to a predisposition for diabetic com- 
plications*’. Finally, our findings that gut metagenomic markers are 
able to differentiate between T2D cases and controls with a higher 
level of specificity than similar analyses based on human genome 
variation” raises the possibility for a mode of monitoring gut health 
and a complementary approach for risk assessment of this common 
disorder. 


METHODS SUMMARY 
Sample collection and DNA extraction. Faecal samples were obtained from 368 
volunteers (345 samples for MGWAS and 23 additional samples for T2D clas- 
sification) after signing an informed consent form. The sampling procedure was 
approved by the Ethical Committee for Clinical Research from the Peking 
University Shenzhen Hospital, Shenzhen Second People’s Hospital and 
Medical Research Center of Guangdong General Hospital. The individuals had 
not received any antibiotic treatment within 2 months before sample collection. 
The samples were frozen immediately and underwent DNA extraction using 
standard methods™. 
Sequencing and data processing. Illumina GAIIx and HiSeq 2000 were used to 
sequence the samples. We constructed a paired-end library with insert size of 
~350 base pairs for every sample. Adaptor contamination and low-quality reads 
were discarded from the raw reads, and the remaining reads were filtered to 
eliminate human host DNA based on the human genome reference (hg18). 
Full Methods and associated references are available in the Supplementary 
Information. 
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Comprehensive molecular portraits of 
human breast tumours 


The Cancer Genome Atlas Network* 


We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, 
messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information 
across platforms provided Key insights into previously defined gene expression subtypes and demonstrated the existence 
of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular 
heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at >10% incidence across all 
breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment 
of specific mutations in GATA3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel 
protein-expression-defined subgroups, possibly produced by stromal/microenvironmental elements, and integrated 
analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated 
HER2/EGFR/ phosphorylated EGFR signature within the HER2-enriched expression subtype. Comparison of basal-like 
breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related 
aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by 
different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable 


plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer. 


Breast cancer is one of the most common cancers with greater than 
1,300,000 cases and 450,000 deaths each year worldwide. Clinically, 
this heterogeneous disease is categorized into three basic therapeutic 
groups. The oestrogen receptor (ER) positive group is the most 
numerous and diverse, with several genomic tests to assist in predict- 
ing outcomes for ER* patients receiving endocrine therapy'’. The 
HER2 (also called ERBB2) amplified group’ is a great clinical success 
because of effective therapeutic targeting of HER2, which has led to 
intense efforts to characterize other DNA copy number aberrations*”. 
Triple-negative breast cancers (TNBCs, lacking expression of ER, 
progesterone receptor (PR) and HER2), also known as basal-like 
breast cancers®, are a group with only chemotherapy options, and 
have an increased incidence in patients with germline BRCAI muta- 
tions’® or of African ancestry”. 

Most molecular studies of breast cancer have focused on just one or 
two high information content platforms, most frequently mRNA 
expression profiling or DNA copy number analysis, and more 
recently massively parallel sequencing’’*. Supervised clustering of 
mRNA expression data has reproducibly established that breast 
cancers encompass several distinct disease entities, often referred to 
as the intrinsic subtypes of breast cancer'*™*. The recent development of 
additional high information content assays focused on abnormalities 
in DNA methylation, microRNA (miRNA) expression and protein 
expression, provide further opportunities to characterize more com- 
pletely the molecular architecture of breast cancer. In this study, a 
diverse set of breast tumours were assayed using six different technology 
platforms. Individual platform and integrated pathway analyses iden- 
tified many subtype-specific mutations and copy number changes that 
identify therapeutically tractable genomic aberrations and other events 
driving tumour biology. 


Samples and clinical data 


Tumour and germline DNA samples were obtained from 825 
patients. Different subsets of patients were assayed on each platform: 


466 tumours from 463 patients had data available on five platforms 
including Agilent mRNA expression microarrays (n = 547), Illumina 
Infinitum DNA methylation chips (n = 802), Affymetrix 6.0 single 
nucleotide polymorphism (SNP) arrays (n = 773), miRNA sequencing 
(n = 697), and whole-exome sequencing (n = 507); in addition, 348 of 
the 466 samples also had reverse-phase protein array (RPPA) data 
(n = 403). Owing to the short median overall follow up (17 months) 
and the small number of overall survival events (93 out of 818), survival 
analyses will be presented in a later publication. Demographic and 
clinical characteristics are presented in Supplementary Table 1. 


Significantly mutated genes in breast cancer 


Overall, 510 tumours from 507 patients were subjected to whole- 
exome sequencing, identifying 30,626 somatic mutations comprised 
of 28,319 point mutations, 4 dinucleotide mutations, and 2,302 
insertions/deletions (indels) (ranging from 1 to 53 nucleotides). The 
point mutations included 6,486 silent, 19,045 missense, 1,437 
nonsense, 26 read-through, 506 splice-site mutations, and 819 muta- 
tions in RNA genes. Comparison to COSMIC and OMIM databases 
identified 619 mutations across 177 previously reported cancer genes. 
Of 19,045 missense mutations, 9,484 were predicted to have a high 
probability of being deleterious by Condel'*. The MuSiC package’, 
which determines the significance of the observed mutation rate of 
each gene based on the background mutation rate, identified 35 sig- 
nificantly mutated genes (excluding LOC or Ensembl gene IDs) by at 
least two tests (convolution and likelihood ratio tests) with false dis- 
covery rate (FDR) <5% (Supplementary Table 2). 

In addition to identifying nearly all genes previously implicated in 
breast cancer (PIK3CA, PTEN, AKT1, TP53, GATA3, CDH1, RB1, 
MLL3, MAP3K1 and CDKN1B), a number of novel significantly 
mutated genes were identified including TBX3, RUNX1, CBFB, AFF2, 
PIK3R1, PTPN22, PTPRD, NF1, SF3B1 and CCND3. TBX3, which is 
mutated in ulnar-mammary syndrome and involved in mammary 
gland development”, harboured 13 mutations (8 frame-shift indels, 


*A list of participants and their affiliations appears at the end of the paper. 
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1 in-frame deletion, 1 nonsense, and 3 missense), suggesting a loss of 
function. Additionally, 2 mutations were found in TBX4 and 1 muta- 
tion in TBX5, which are genes involved in Holt-Oram syndrome’. 
Two other transcription factors, CTCF and FOXA1, were at or near 
significance harbouring 13 and 8 mutations, respectively. RUNX1 and 
CBFB, both rearranged in acute myeloid leukaemia and interfering 
with haematopoietic differentiation, harboured 19 and 9 mutations, 
respectively. PIK3R1 contained 14 mutations, most of which clustered 
in the PIK3CA interaction domain similar to previously identified 
mutations in glioma’? and endometrial cancer’. We also observed a 
statistically significant exclusion pattern among PIK3R1, PIK3CA, 
PTEN and AKT1 mutations (P = 0.025). Mutation of splicing factor 
SF3B1, previously described in myelodysplastic syndromes’ and 
chronic lymphocytic leukaemia”, was significant with 15 non-silent 
mutations, of which 4 were a recurrent K700E substitution. Two 
protein tyrosine phosphatases (PTPN22 and PTPRD) were also signifi- 
cantly mutated; frequent deletion/mutation of PTPRD is observed in 
lung adenocarcinoma”. 


Mutations and mRNA-expression subtype associations 


We analysed the somatic mutation spectrum within the context of the 
four mRNA-expression subtypes, excluding the normal-like group 
owing to small numbers (n = 8) (Fig. 1). Several significantly mutated 
genes showed mRNA-subtype-specific (Supplementary Figs 1-3) and 
clinical-subtype-specific patterns of mutation (Supplementary Table 2). 
Significantly mutated genes were considerably more diverse and 
recurrent within luminal A and luminal B tumours than within 
basal-like and HER2-enriched (HER2E) subtypes; however, the overall 
mutation rate was lowest in luminal A subtype and highest in the basal- 
like and HER2E subtypes. The luminal A subtype harboured the most 
significantly mutated genes, with the most frequent being PIK3CA 
(45%), followed by MAP3K1, GATA3, TP53, CDH1 and MAP2K4. 
Twelve per cent of luminal A tumours contained likely inactivating 
mutations in MAP3K1 and MAP2K4, which represent two contiguous 


Predicted somatic non-silent mutations @ Truncation mutation 


steps in the p38-JNK1 stress kinase pathway”. Luminal B cancers 
exhibited a diversity of significantly mutated genes, with TP53 and 
PIK3CA (29% each) being the most frequent. The luminal tumour 
subtypes markedly contrasted with basal-like cancers where TP53 
mutations occurred in 80% of cases and the majority of the luminal 
significantly mutated gene repertoire, except PIK3CA (9%), were 
absent or near absent. The HER2E subtype, which has frequent 
HER2 amplification (80%), had a hybrid pattern with a high frequency 
of TP53 (72%) and PIK3CA (39%) mutations and a much lower fre- 
quency of other significantly mutated genes including PIK3R1 (4%). 

Intrinsic mRNA subtypes differed not only by mutation frequencies 
but also by mutation type. Most notably, TP53 mutations in basal-like 
tumours were mostly nonsense and frame shift, whereas missense 
mutations predominated in luminal A and B tumours (Supplemen- 
tary Fig. 1). Fifty-eight somatic GATA3 mutations, some of which were 
previously described”, were detected including a hotspot 2-base-pair 
deletion within intron 4 only in the luminal A subtype (13 out of 13 
mutants) (Supplementary Fig. 2). In contrast, 7 out of 9 frame-shift 
mutations in exon 5 (DNA binding domain) occurred in luminal B 
cancers. PIK3CA mutation frequency and spectrum also varied by 
mRNA subtype (Supplementary Fig. 3); the recurrent PIK3CA 
E545K mutation was present almost exclusively within luminal A 
(25 out of 27) tumours. CDH1 mutations were common (30 out of 
36) within the lobular histological subtype and corresponded with 
lower CDH1 mRNA (Supplementary Fig. 4) and protein expression. 
Finally, we identified 4 out of 8 somatic variants in HER2 within lobular 
cancers, three of which were within the tyrosine kinase domain. 

We performed analyses on a selected set of genes” using the normal 
tissue DNA data and detected a number of germline predisposing 
variants. These analyses identified 47 out of 507 patients with 
deleterious germline variants, representing nine different genes 
(ATM, BRCA1, BRCA2, BRIP1, CHEK2, NBN, PTEN, RAD51C and 
TP53; Supplementary Table 3), supporting the hypothesis that ~10% 
of sporadic breast cancers may have a strong germline contribution. 
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Figure 1 | Significantly mutated genes and correlations with genomic and 
clinical features. Tumour samples are grouped by mRNA subtype: luminal A 
(n = 225), luminal B (n = 126), HER2E (n = 57) and basal-like (n = 93). The 
left panel shows non-silent somatic mutation patterns and frequencies for 
significantly mutated genes. The middle panel shows clinical features: dark 
grey, positive or T2-4; white, negative or T1; light grey, N/A or equivocal. 
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These data confirmed the association between the presence of 
germline BRCA1 mutations and basal-like breast cancers”®. 


Gene expression analyses (mRNA and miRNA) 


Several approaches were used to look for structure in the mRNA 
expression data. We performed an unsupervised hierarchical cluster- 
ing analysis of 525 tumours and 22 tumour-adjacent normal tissues 
using the top 3,662 variably expressed genes (Supplementary Fig. 5); 
SigClust analysis identified 12 classes (5 classes with >9 samples per 
class). We performed a semi-supervised hierarchical cluster analysis 
using a previously published ‘intrinsic gene list’, which identified 13 
classes (9 classes with >9 samples per class) (Supplementary Fig. 6). 
We also classified each sample using the 50-gene PAM50 model’* 
(Supplementary Fig. 5). High concordance was observed between 
all three analyses; therefore, we used the PAM50-defined subtype 
predictor as a common classification metric. There were only eight 
normal-like and eight claudin-low tumours”’, thus we did not per- 
form focussed analyses on these two subtypes. 

MicroRNA expression levels were assayed via Illumina sequencing, 
using 1,222 miRBase** v16 mature and star strands as the reference 
database of miRNA transcripts/genes. Seven subtypes were identified 
by consensus non-negative matrix factorization (NMF) clustering 
using an abundance matrix containing the 25% most variable 
miRNAs (306 transcripts/genes or MIMATs (miRNA IDs)). These 
subtypes correlated with mRNA subtypes, ER, PR and HER? clinical 
status (Supplementary Fig. 7). Of note, miRNA groups 4 and 5 
showed high overlap with the basal-like mRNA subtype and con- 
tained many TP53 mutations. The remaining miRNA groups (1-3, 
6 and 7) were composed of a mixture of luminal A, luminal B and 
HER2E with little correlation with the PAM50 defined subtypes. With 
the exception of TP53—which showed a strong positive correlation— 
and PIK3CA and GATA3—which showed negative associations with 
groups 4 and 5, respectively—there was little correlation with muta- 
tion status and miRNA subtype. 


DNA methylation 

Illumina Infinitum DNA methylation arrays were used to assay 802 
breast tumours. Data from HumanMethylation27 (HM27) and 
HumanMethylation450 (HM450) arrays were combined and filtered 
to yield a common set of 574 probes used in an unsupervised clustering 
analysis, which identified five distinct DNA methylation groups 
(Supplementary Fig. 8). Group 3 showed a hypermethylated pheno- 
type and was significantly enriched for luminal B mRNA subtype and 
under-represented for PIK3CA, MAP3KI1 and MAP2K4 mutations. 
Group 5 showed the lowest levels of DNA methylation, overlapped 
with the basal-like mRNA subtype, and showed a high frequency of 
TP53 mutations. HER2-positive (HER2*) clinical status, or the 
HER2E mRNA subtype, had only a modest association with the 
methylation subtypes. 

A supervised analysis of the DNA methylation and mRNA expres- 
sion data was performed to compare DNA methylation group 3 
(N = 49) versus all tumours in groups 1, 2 and 4 (excluding group 5, 
which consisted predominantly of basal-like tumours). This analysis 
identified 4,283 genes differentially methylated (3,735 higher in group 
3 tumours) and 1,899 genes differentially expressed (1,232 downregu- 
lated); 490 genes were both methylated and showed lower expression 
in group 3 tumours (Supplementary Table 4). A DAVID (database for 
annotation, visualization and integrated discovery) functional annota- 
tion analysis identified “extracellular region part’ and “Wnt signalling 
pathway’ to be associated with this 490-gene set; the group 3 hyper- 
methylated samples showed fewer PIK3CA and MAP3K1 mutations, 
and lower expression of Wnt-pathway genes. 


DNA copy number 


A total of 773 breast tumours were assayed using Affymetrix 6.0 
SNP arrays. Segmentation analysis and GISTIC were used to 
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identify focal amplifications/deletions and arm-level gains and losses 
(Supplementary Table 5). These analyses confirmed all previously 
reported copy number variations and highlighted a number of sig- 
nificantly mutated genes including focal amplification of regions con- 
taining PIK3CA, EGFR, FOXA1 and HER2, as well as focal deletions of 
regions containing MLL3, PTEN, RBI and MAP2K4 (Supplementary 
Fig. 9); in all cases, multiple genes were included within each altered 
region. Importantly, many of these copy number changes correlated 
with mRNA subtype including characteristic loss of 5q and gain of 
10p in basal-like cancers*”’ and gain of 1q and/or 16q loss in luminal 
tumours*. NMF clustering of GISTIC segments identified five copy 
number clusters/groups that correlated with mRNA subtypes, ER, PR 
and HER2 clinical status, and TP53 mutation status (Supplementary 
Fig. 10). In addition, this aCGH subtype classification was highly 
correlated with the aCGH subtypes recently defined by ref. 30 
(Supplementary Fig. 11). 


Reverse phase protein arrays 


Quantified expression of 171 cancer-related proteins and phospho- 
proteins by RPPA was performed on 403 breast tumours*’. 
Unsupervised hierarchical clustering analyses identified seven 
subtypes; one class contained too few cases for further analysis 
(Supplementary Fig. 12). These protein subtypes were highly 
concordant with the mRNA subtypes, particularly with basal-like 
and HER2E mRNA subtypes. Closer examination of the HER2- 
containing RPPA-defined subgroup showed coordinated overexpres- 
sion of HER2 and EGER with a strong concordance with phosphorylated 
HER2 (pY1248) and EGFR (pY992), probably from heterodimeriza- 
tion and cross-phosphorylation. Although there is a potential for 
modest cross reactivity of antibodies against these related total and 
phospho-proteins, the concordance of phosphorylation of HER2 and 
EGFR was confirmed using multiple independent antibodies. 

In RPPA-defined luminal tumours, there was high protein expres- 
sion of ER, PR, AR, BCL2, GATA3 and INPP4B, defining mostly 
luminal A cancers and a second more heterogeneous protein subgroup 
composed of both luminal A and luminal B cancers. Two potentially 
novel protein-defined subgroups were identified: reactive I consisted 
primarily of a subset of luminal A tumours, whereas reactive II 
consisted of a mixture of mRNA subtypes. These groups are termed 
‘reactive’ because many of the characteristic proteins are probably 
produced by the microenvironment and/or cancer-activated fibroblasts 
including fibronectin, caveolin 1 and collagen VI. These two RPPA 
groups did not have a marked difference in the percentage tumour cell 
content when compared to each other, or the other protein subtypes, 
as assessed by SNP array analysis or pathological examination. In 
addition, supervised analyses of reactive I versus II groups using 
miRNA expression, DNA methylation, mutation, or DNA copy 
number data identified no significant differences between these groups, 
whereas similar supervised analyses using protein and mRNA expres- 
sion identified many differences. 


Multiplatform subtype discovery 

To reveal higher-order structure in breast tumours based on multiple 
data types, significant clusters/subtypes from each of five platforms 
were analysed using a multiplatform data matrix subjected to 
unsupervised consensus clustering (Fig. 2). This ‘cluster of clusters’ 
(C-of-C) approach illustrated that basal-like cancers had the most 
distinct multiplatform signature as all the different platforms for 
the basal-like groups clustered together. To a great extent, the four 
major C-of-C subdivisions correlated well with the previously published 
mRNA subtypes (driven, in part, by the fact that the four intrinsic 
subtypes were one of the inputs). Therefore, we also performed 
C-of-C analysis with no mRNA data present (Supplementary Fig. 13) 
or with the 12 unsupervised mRNA subtypes (Supplementary Fig. 14), 
and in each case 4-6 groups were identified. Recent work identified ten 
copy-number-based subgroups in a 997 breast cancer set’. We evaluated 
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Figure 2 | Coordinated analysis of breast cancer subtypes defined from five 
different genomic/proteomic platforms. a, Consensus clustering analysis of 
the subtypes identifies four major groups (samples, n = 348). The blue and 
white heat map displays sample consensus. b, Heat-map display of the subtypes 
defined independently by miRNAs, DNA methylation, copy number (CN), 
PAM50 mRNA expression, and RPPA expression. The red bar indicates 
membership of a cluster type. ¢, Associations with molecular and clinical 
features. P values were calculated using a chi-squared test. 


this classification in a C-of-C analysis instead of our five-class copy 
number subtypes, with either the PAM50 (Supplementary Fig. 15) or 
12 unsupervised mRNA subtypes (Supplementary Fig. 16); each of 
these C-of-C classifications was highly correlated with PAM50 
mRNA subtypes and with the other C-of-C analyses (Fig. 2). The 
transcriptional profiling and RPPA platforms demonstrated a high 
correlation with the consensus structure, indicating that the informa- 
tion content from copy number aberrations, miRNAs and methylation 
is captured at the level of gene expression and protein function. 


Luminal/ER* summary analysis 


Luminal/ER* breast cancers are the most heterogeneous in terms of 
gene expression (Supplementary Fig. 5), mutation spectrum (Fig. 1), 
copy number changes (Supplementary Fig. 9) and patient outcomes’*. 
One of the most dominant features is high mRNA and protein expres- 
sion of the luminal expression signature (Supplementary Fig. 5), which 
contains ESR1, GATA3, FOXA1, XBP1 and MYB; the luminal/ER* 
cluster also contained the largest number of significantly mutated 
genes. Most notably, GATA3 and FOXA1 were mutated in a mutually 
exclusive fashion, whereas ESR1 and XBPI were typically highly 
expressed but infrequently mutated. Mutations in RUNX1 and its 
dimerization partner CBFB may also have a role in aberrant ER 
signalling in luminal tumours, as RUNX1 functions as an ER ‘DNA 
tethering factor”’. PARADIGM”® analysis comparing luminal versus 
basal-like cancers further emphasized the presence of a hyperactivated 
FOXA1-ER complex as a critical network hub differentiating these two 
tumour subtypes (Supplementary Fig. 17). 

A confirmatory finding here was the high mutation frequency 
of PIK3CA in luminal/ER™ breast cancers***°. Through multiple 
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technology platforms, we examined possible relationships between 
PIK3CA mutation, PTEN loss, INPP4B loss and multiple gene 
and protein expression signatures of pathway activity. RPPA data 
demonstrated that pAKT, pS6 and p4EBP1, typical markers of 
phosphatidylinositol-3-OH kinase (PI(3)K) pathway activation, were 
not elevated in PIK3CA-mutated luminal A cancers; instead, they 
were highly expressed in basal-like and HER2E mRNA subtypes 
(the latter having frequent PIK3CA mutations) and correlated 
strongly with INPP4B and PTEN loss, and to a degree with PIK3CA 
amplification. Similarly, protein*® and three mRNA signatures”? of 
PI(3)K pathway activation were enriched in basal-like over luminal A 
cancers (Fig. 3a). This apparent disconnect between the presence of 
PIK3CA mutations and biomarkers of pathway activation has been 
previously noted”. 

Another striking luminal/ER* subtype finding was the frequent 
mutation of MAP3K1 and MAP2K4, which represent two contiguous 
steps within the p38-JNK1 pathway**”’. These mutations are predicted 
to be inactivating, with MAP2K4 also a target of focal DNA loss in 
luminal tumours (Supplementary Fig. 9). To explore the possible 
interplay between PIK3CA, MAP3K and MAP2K4 signalling, MEMo 
analysis** was performed to identify mutually exclusive alterations 
targeting frequently altered genes likely to belong to the same pathway 
(Fig. 4). Across all breast cancers, MEMo identified a set of modules 
that highlight the differential activation events within the receptor 
tyrosine kinase (RTK)-PI(3)K pathway (Fig. 4a); mutations of 
PIK3CA were very common in luminal/ER* cancers whereas PTEN 
loss was more common in basal-like tumours. Almost all MAP3K1and 
MAP2K4 mutations were in luminal tumours, yet MAP3KI and 
MAP2Ké4 appeared almost mutually exclusive relative to one another. 

The TP53 pathway was differentially inactivated in luminal/ER* 
breast cancers, with a low TP53 mutation frequency in luminal A 
(12%) and a higher frequency in luminal B (29%) cancers (Fig. 1). In 
addition to TP53 itself, a number of other pathway-inactivating events 
occurred including ATM loss and MDM2 amplification (Figs 3b and 
4b), both of which occurred more frequently within luminal B cancers. 
Gene expression analysis demonstrated that individual markers of 
functional TP53 (GADD45A and CDKNIA), and TP53 activity? 
signatures, were highest in luminal A cancers (Fig. 3b). These data 
indicate that the TP53 pathway remains largely intact in luminal A 
cancers but is often inactivated in the more aggressive luminal B 
cancers’. Other PARADIGM-based pathway differences driving 
luminal B versus luminal A included hyperactivation of transcriptional 
activity associated with MYC and FOXM1 proliferation. 

The critical retinoblastoma/RB1 pathway also showed mRNA- 
subtype-specific alterations (Fig. 3c). RB1 itself, by mRNA and 
protein expression, was detectable in most luminal cancers, with 
highest levels within luminal A. A common oncogenic event was 
cyclin D1 amplification and high expression, which preferentially 
occurred within luminal tumours, and more specifically within 
luminal B. In contrast, the presumed tumour suppressor CDKN2C 
(also called p18) was at its lowest levels in luminal A cancers, con- 
sistent with observations in mouse models. Finally, RB1 activity 
signatures were also high in luminal cancers***. Luminal A tumours, 
which have the best prognosis, are the most likely to retain activity of 
the major tumour suppressors RB1 and TP53. 

These genomic characterizations also provided clues for druggable 
targets. We compiled a drug target table in which we defined a target 
as a gene/protein for which there is an approved or investigational 
drug in human clinical trials targeting the molecule or canonical 
pathway (Supplementary Table 6). In luminal/ER* cancers, the high 
frequency of PIK3CA mutations suggests that inhibitors of this 
activated kinase or its signalling pathway may be beneficial. Other 
potential significantly mutated gene drug candidates include AKT1 
inhibitors (11 out of 12 AKTI1 variants were luminal) and PARP 
inhibitors for BRCA1/BRCA2 mutations. Although still unapproved 
as biomarkers, many potential copy-number-based drug targets 
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Figure 3 | Integrated analysis of the PI(3)K, TP53 and RB1 pathways. Breast 
cancer subtypes differ by genetic and genomic targeting events, with 
corresponding effects on pathway activity. a-c, For PI(3)K (a), TP53 (b) and 
RBI (c) pathways, key genes were selected using prior biological knowledge. 
Multiple mRNA expression signatures for a given pathway were defined 
(details in Supplementary Methods; PI(3)K:Saal, PTEN loss in human breast 
tumours; CMap, PI(3)K/mTOR inhibitor treatment in vitro; Majumder, Akt 
overexpression in mouse model; TP53: IARC, expert-curated p53 targets; GSK, 
TP53 mutant versus wild-type cell lines; KANNAN, TP53 overexpression in 
vitro; TROESTER, TP53 knockdown in vitro; RB: CHICAS, RB1 mouse 
knockout versus wild type; LARA, RB1 knockdown in vitro; 
HERSCHKOWITZ, RB1 loss of heterozygosity (LOH) in human breast 
tumours) and applied to the gene expression data, in order to score each 
tumour for relative signature activity (yellow, more active). The PI(3)K panel 
includes a protein-based (RPPA) proteomic signature. Tumours were ordered 
first by mRNA subtype, although specific ordering differs between the panels. P 
values were calculated by a Pearson’s correlation or a Chi-squared test. 


were identified including amplifications of fibroblast growth factor 
receptors (FGFRs) and IGFR1, as well as cyclin D1, CDK4 and CDK6. 
A summary of the general findings in luminal tumours and the other 
subtypes is presented in Table 1. 


HER2-based classifications and summary analysis 
DNA amplification of HER2 was readily evident in this study 
(Supplementary Fig. 9) together with overexpression of multiple 
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HER2-amplicon-associated genes that in part define the HER2E 
mRNA subtype (Supplementary Fig. 5). However, not all clinically 
HER2* tumours are of the HER2E mRNA subtype, and not all 
tumours in the HER2E mRNA subtype are clinically HER2*. 
Integrated analysis of the RPPA and mRNA data clearly identified a 
HER2* group (Supplementary Fig. 12). When the HER2™ protein 
and HER2E mRNA subtypes overlapped, a strong signal of EGFR, 
pEGFR, HER2 and pHER2 was observed. However, only ~50% of 
clinically HER2* tumours fall into this HER2E-mRNA-subtype/ 
HER2-protein group, the rest of the clinically HER2* tumours were 
observed predominantly in the luminal mRNA subtypes. 

These data indicate that there exist at least two types of clinically 
defined HER2* tumours. To identify differences between these groups, 
a supervised gene expression analysis comparing 36 HER2E-mRNA- 
subtype/HER2™ versus 31 luminal-mRNA-subtype/HER2* tumours 
was performed and identified 302 differentially expressed genes 
(q-value = 0%) (Supplementary Fig. 18 and Supplementary Table 7). 
These genes largely track with ER status but also indicated that HER2E- 
mRNA-subtype/HER2* tumours showed significantly higher expres- 
sion of a number of RTKs including FGFR4, EGFR, HER2 itself, as well 
as genes within the HER2 amplicon (including GRB7). Conversely, the 
luminal-mRNA-subtype/HER2* tumours showed higher expression 
of the luminal cluster of genes including GATA3, BCL2 and ESRI. 
Further support for two types of clinically defined HER2* disease 
was evident in the somatic mutation data supervised by either 
mRNA subtype or ER status; TP53 mutations were significantly 
enriched in HER2E or ER-negative tumours whereas GATA3 muta- 
tions were only observed in luminal subtypes or ER* tumours. 

Analysis of the RPPA data according to mRNA subtype identified 
36 differentially expressed proteins (q-value <5%) (Supplementary 
Fig. 18G and Supplementary Table 8). The EGFR/pEGFR/HER2/ 
pHER2 signal was again observed and present within the HER2E- 
mRNA-subtype/HER2* tumours, as was high pSRC and pS6; con- 
versely, many protein markers of luminal cancers again distinguished 
the luminal-mRNA-subtype/HER2* tumours. Given the importance 
of clinical HER2 status, a more focused analysis was performed based 
on the RPPA-defined protein expression of HER2 (Supplementary 
Fig. 19)—the results strongly recapitulated findings from the RPPA 
and mRNA subtypes including a high correlation between HER2 
clinical status, HER2 protein by RPPA, pHER2, EGFR and pEGER. 
These multiple signatures, namely HER2E mRNA subtype, HER2 
amplicon genes by mRNA expression, and RPPA EGFR/pEGFR/ 
HER2/pHER2 signature, ultimately identify at least two groups/ 
subtypes within clinically HER2* tumours (Table 1). These signatures 
represent breast cancer biomarker(s) that could potentially predict 
response to anti- HER2 targeted therapies. 

Many therapeutic advances have been made for clinically HER2* 
disease. This study has identified additional somatic mutations that 
represent potential therapeutic targets within this group, including a 
high frequency of PIK3CA mutations (39%), a lower frequency of 
PTEN and PIK3R1 mutations (Supplementary Table 6), and genomic 
losses of PTEN and INPP4B. Other possible druggable mutations 
included variants within HER family members including two somatic 
mutations in HER2, two within EGFR, and five within HER3. 
Pertuzumab, in combination with trastuzumab, targets the HER2- 
HER3 heterodimer®’; however, these data suggest that targeting EGFR 
with HER2 could also be beneficial. Finally, the HERZE mRNA 
subtype typically showed high aneuploidy, the highest somatic 
mutation rate (Table 1), and DNA amplification of other potential 
therapeutic targets including FGFRs, EGFR, CDK4 and cyclin D1. 


Basal-like summary analysis 

The basal-like subtype was discovered more than a decade ago by 
first-generation cDNA microarrays’. These tumours are often 
referred to as triple-negative breast cancers (TNBCs) because most 
basal-like tumours are typically negative for ER, PR and HER2. 
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Figure 4 | Mutual exclusivity modules in cancer (MEMo) analysis. Mutual 
exclusivity modules are represented by their gene components and connected 
to reflect their activity in distinct pathways. For each gene, the frequency of 
alteration in basal-like (right box) and non-basal (left box) is reported. Next to 
each module is a fingerprint indicating what specific alteration is observed for 
each gene (row) in each sample (column). a, MEMo identified several 
overlapping modules that recapitulate the RTK-PI(3)K and p38-JNK1 


However, ~75% of TNBCs are basal-like with the other 25% com- 
prised of all other mRNA subtypes*. In this data set, there was a high 
degree of overlap between these two distinctions with 76 TNBCs, 81 
basal-like, and 65 that were both TNBCs and basal-like. Given the 
known heterogeneity of TNBCs, and that the basal-like subtype 
proved to be distinct on every platform, we chose to use the basal-like 
distinction for comparative analyses. 

Basal-like tumours showed a high frequency of TP53 mutations 
(80%)’, which when combined with inferred TP53 pathway activity 
suggests that loss of TP53 function occurs within most, if not all, 
basal-like cancers (Fig. 3b). In addition to loss of TP53, a MEMo 
analysis reconfirmed that loss of RBI and BRCAI are basal-like 
features (Fig. 4c)*”°°. PIK3CA was the next most commonly mutated 
gene (~9%); however, inferred PI(3)K pathway activity, whether 
from gene*””’, protein®’, or high PI(3)K/AKT pathway activities, 
was highest in basal-like cancers (Fig. 3a). Alternative means of 
activating the PI(3)K pathway in basal-like cancers probably includes 
loss of PTEN and INPP4B and/or amplification of PIK3CA. A recent 
paper’* performed exome sequencing of 102 TNBCs. Five of the top 
six most frequent TNBC mutations in ref. 12 were also observed at a 
similar frequency in our TNBC subset (Myo3A not present here); of 
those five, three passed our test as a significantly mutated gene in 
TNBCs (Supplementary Table 2). 
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IHyper-methylation 


signalling pathways and whose core was the top-scoring module. b, MEMo 
identified alterations to TP53 signalling as occurring within a statistically 
significant mutually exclusive trend. c, A basal-like only MEMo analysis 
identified one module that included ATM mutations, defects at BRCAI and 
BRCA2, and deregulation of the RB1 pathway. A gene expression heat map is 
below the fingerprint to show expression levels. 


Expression features of basal-like tumours include a characteristic 
signature containing keratins 5, 6 and 17 and high expression of 
genes associated with cell proliferation (Supplementary Fig. 5). 
A PARADIGM™ analysis of basal-like versus luminal tumours 
emphasized the importance of hyperactivated FOXM1 as a transcrip- 
tional driver of this enhanced proliferation signature (Supplementary 
Fig. 17). PARADIGM also identified hyperactivated MYC and HIF1- 
a/ARNT network hubs as key regulatory features of basal-like 
cancers. Even though chromosome 8q24 is amplified across all 
subtypes (Supplementary Fig. 9), high MYC activation seems to be 
a basal-like characteristic*’. 

Given the striking contrasts between basal-like and luminal/ 
HER2E subtypes, we performed a MEMo analysis on basal-like 
tumours alone. The top-scoring module included ATM mutations, 
BRCA1 and BRCA2 inactivation, RB1 loss and cyclin El amplification 
(Fig. 4c). Notably, these same modules were identified previously for 
serous ovarian cancers*’. Furthermore, the basal-like (and TNBC) 
mutation spectrum was reminiscent of the spectrum seen in serous 
ovarian cancers” with only one gene (that is, TP53) at >10% muta- 
tion frequency. To explore possible similarities between serous 
ovarian and the breast basal-like cancers, we performed a number 
of analyses comparing ovarian versus breast luminal, ovarian versus 
breast basal-like, and breast basal-like versus breast luminal cancers 
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Table 1 | Highlights of genomic, clinical and proteomic features of subtypes 
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Subtype Luminal A Luminal B Basal-like HER2E 

ER*/HER2~ (%) 87 82 10 20 

HER2* (%) 7 15 2 68 

TNBCs (%) 2 1 80 9 

TP53 pathway TP53 mut (12%); gain of MDM2 ~—- TP53 mut (382%); gain of MDM2 TP53 mut (84%); gain of MDM2 ~—-TP53 mut (75%); gain of 
(14%) (31%) (14%) MDM2 (30%) 


PIK3CA/PTEN pathway 


RB1 pathway 


mRNA expression 


Copy number 


DNA mutations 
DNA methylation 


Protein expression 


PIK3CA mut (49%); PTEN 
mut/loss (13%); INPP4B loss (9%) 


Cyclin D1 amp (29%); CDK4 gain 
(14%); low expression of 
CDKN2C; high expression of RB1 
High ER cluster; low proliferation 


Most diploid; many with quiet 
genomes; 1q, 8q, 8p11 gain; 8p, 
16q loss; 11q13.3 amp (24%) 


PIK3CA (49%); TP53 (12%); 
GATA3 (14%); MAP3K1 (14%) 


High oestrogen signalling; high 
MYB; RPPA reactive subtypes 


PIK3CA mut (32%) PTEN mut/loss 
(24%) INPP4B loss (16%) 


Cyclin D1 amp (58%); CDK4 gain 
(25%) 


Lower ER cluster; high proliferation 


Most aneuploid; many with focal 
amp; 1q, 8q, 8p11 gain; 8p, 16q 
loss; 11q13.3 amp (51%); 
8p11.23 amp (28%) 

TP53 (32%); PIK3CA (32%); 
MAP3K1 (5%) 
Hypermethylated phenotype for 
subset 

Less oestrogen signalling; high 
FOXM1 and MYC; RPPA reactive 


PIK3CA mut (7%); PTEN mut/loss 
(35%); INPP4B loss (30%) 


RB1 mut/loss (20%); cyclin E1 
amp (9%); high expression of 
CDKNZA; low expression of RB1 
Basal signature; high proliferation 


Most aneuploid; high genomic 
instability; 1q, 10p gain; 8p, 5q 
loss; MYC focal gain (40%) 
TP53 (84%); PIK3CA (7%) 
Hypomethylated 


High expression of DNA repair 
proteins, PTEN and INPP4B loss 


PIK3CA mut (42%); PTEN 
mut/loss (19%); INPP4B 
loss (30%) 

Cyclin D1 amp (38%); 
CDK4 gain (24%) 


HER2 amplicon signature; 
high proliferation 

Most aneuploid; high 
genomic instability; 1q, 8q 
gain; 8p loss; 17q12 focal 
ERRB2 amp (71%) 

TP53 (75%); PIK3CA 
(42%); PIK3R1 (8%) 


High protein and phospho- 
protein expression of EGFR 


subtypes 


signature (pAKT) and HER2 


Percentages are based on 466 tumour overlap list. Amp, amplification; mut, mutation. 


(Fig. 5). Comparing copy number landscapes, we observed several 
common features between ovarian and basal-like tumours including 
widespread genomic instability and common gains of 1q, 3q, 8q and 
12p, and loss of 4q, 5q and 8p (Supplementary Fig. 20A). Using a 
more global copy number comparison, we examined the overall 
fraction of the genome altered and the overall copy number correla- 
tion of ovarian cancers versus each breast cancer mRNA subtype 


ATM/ATR TP53 
= ee 84% 95% 
53/p21 
Y Y 
BRCA1 SRE Proliferation, Apoptosis 
EX 30% 23% 9% 18% survival 
A ® Somatic mutation 
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Figure 5 | Comparison of breast and serous ovarian carcinomas. 

a, Significantly enriched genomic alterations identified by comparing basal-like 
or serous ovarian tumours to luminal cancers. b, Inter-sample correlations 
(yellow, positive) between gene transcription profiles of breast tumours 
(columns; TCGA data, arranged by subtype) and profiles of cancers from 
various tissues of origin (rows; external “TGEN expO’ data set, GSE2109) 
including ovarian cancers. 


(Supplementary Fig. 20A, B); in both cases, basal-like tumours were 
the most similar to the serous ovarian carcinomas. 

We systematically looked for other common features between 
serous ovarian and basal-like tumours when each was compared to 
luminal. We identified: (1) BRCAI inactivation; (2) RB1 loss and 
cyclin El amplification; (3) high expression of AKT3; (4) MYC 
amplification and high expression; and (5) a high frequency of 
TP53 mutations (Fig. 5a). An additional supervised analysis of a large, 
external multitumour type transcriptomic data set (Gene Expression 
Omnibus accession GSE2109) was performed where each TCGA 
(The Cancer Genome Atlas) breast tumour expression profile was 
compared via a correlation analysis to that of each tumour in the 
multitumour set. Basal-like breast cancers clearly showed high 
mRNA expression correlations with serous ovarian cancers, as well 
as with lung squamous carcinomas (Fig. 5b). A PARADIGM analysis 
that calculates whether a gene or pathway feature is both differentially 
activated in basal-like versus luminal cancers and has higher overall 
activity across the TCGA ovarian samples was performed; this iden- 
tified comparably high pathway activity of the HIF1-0/ARNT, MYC 
and FOXM1 regulatory hubs in both ovarian and basal-like cancers 
(Supplementary Fig. 20C). The common findings of TP53, RB1 and 
BRCAI1 loss, with MYC amplification, strongly suggest that these are 
shared driving events for basal-like and serous ovarian carcinogenesis. 
This suggests that common therapeutic approaches should be con- 
sidered, which is supported by the activity of platinum analogues and 
taxanes in breast basal-like and serous ovarian cancers. 

Given that most basal-like cancers are TNBCs, finding new drug 
targets for this group is critical. Unfortunately, the somatic mutation 
repertoire for basal-like breast cancers has not provided a common 
target aside from BRCA1 and BRCA2. Here we note that ~20% of 
basal-like tumours had a germline (nm = 12) and/or somatic (n = 8) 
BRCAI or BRCA2 variant, which suggests that one in five basal-like 
patients might benefit from PARP inhibitors and/or platinum com- 
pounds****. The copy number landscape of basal-like cancers showed 
multiple amplifications and deletions, some of which may provide 
therapeutic targets (Supplementary Table 6). Potential targets include 
losses of PTEN and INPP4B, both of which have been shown to 
sensitize cell lines to PI(3)K pathway inhibitors****. Interestingly, 
many of the components of the PI(3)K and RAS-RAF-MEK pathway 
were amplified (but not typically mutated) in basal-like cancers 
including PIK3CA (49%), KRAS (32%), BRAF (30%) and EGFR 
(23%). Other RTKs that are plausible drug targets and amplified in 
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some basal-like cancers include FGFR1, FGFR2, IGFR1, KIT, MET 
and PDGFRA. Finally, the PARADIGM identification of high HIF1- 
a/ARNT pathway activity suggests that these malignancies might be 
susceptible to angiogenesis inhibitors and/or bioreductive drugs that 
become activated under hypoxic conditions. 


Concluding remarks 


The integrated molecular analyses of breast carcinomas that we 
report here significantly extends our knowledge base to produce a 
comprehensive catalogue of likely genomic drivers of the most 
common breast cancer subtypes (Table 1). Our novel observation that 
diverse genetic and epigenetic alterations converge phenotypically 
into four main breast cancer classes is not only consistent with con- 
vergent evolution of gene circuits, as seen across multiple organisms, 
but also with models of breast cancer clonal expansion and in vivo cell 
selection proposed to explain the phenotypic heterogeneity observed 
within defined breast cancer subtypes. 


METHODS SUMMARY 


Specimens were obtained from patients with appropriate consent from institutional 
review boards. Using a co-isolation protocol, DNA and RNA were purified. In total, 
800 patients were assayed on at least one platform. Different numbers of patients 
were used for each platform using the largest number of patients available at the 
time of data freeze; 466 samples (463 patients) were in common across 5 out of 6 
platforms (excluding RPPA) and 348 patients were in common on 6 out of 6 
platforms. Technology platforms used include: (1) gene expression DNA micro- 
arrays’’; (2) DNA methylation arrays; (3) miRNA sequencing; (4) Affymetrix SNP 
arrays; (5) exome sequencing; and (6) reverse phase protein arrays. Each platform, 
except for the exome sequencing, was used in a de novo subtype discovery analysis 
(Supplementary Methods) and then included ina single analysis to define an overall 
subtype architecture. Additional integrated across-platform computational ana- 
lyses were preformed including PARADIGM® and MEMo". 
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Two stellar-mass black holes in the globular 


cluster M22 


Jay Strader’, Laura Chomiuk*?, Thomas J. Maccarone’, James C. A. Miller-Jones® & Anil C. Seth® 


Hundreds of stellar-mass black holes probably form in a typical 
globular star cluster, with all but one predicted to be ejected 
through dynamical interactions'*. Some observational support 
for this idea is provided by the lack of X-ray-emitting binary stars 
comprising one black hole and one other star (‘black-hole/X-ray 
binaries’) in Milky Way globular clusters, even though many 
neutron-star/X-ray binaries are known*. Although a few black 
holes have been seen in globular clusters around other galaxies”®, 
the masses of these cannot be determined, and some may be 
intermediate-mass black holes that form through exotic mechan- 
isms’. Here we report the presence of two flat-spectrum radio 
sources in the Milky Way globular cluster M22, and we argue that 
these objects are black holes of stellar mass (each ~10-20 times 
more massive than the Sun) that are accreting matter. We find a 
high ratio of radio-to-X-ray flux for these black holes, consistent 
with the larger predicted masses of black holes in globular clusters 
compared to those outside®. The identification of two black holes in 
one cluster shows that ejection of black holes is not as efficient as 
predicted by most models’*, and we argue that M22 may contain a 
total population of ~5-100 black holes. The large core radius of 
M22 could arise from heating produced by the black holes’. 

We have obtained very deep radio continuum images of the Milky 
Way globular cluster M22 (NGC 6656) with the Karl G. Jansky Very 
Large Array (VLA). The principal goal of the observations was to search 
for a possible central intermediate-mass black hole via synchrotron 
emission from the accretion of intracluster gas; no central source was 
found’°. However, we serendipitously detected two previously unknown 
radio continuum sources in the core of the cluster (Fig. 1). We term the 
sources M22-VLAI1 and M22-VLA2. Both sources have flat radio 
spectra and are unresolved at our ~1” resolution. 

The core radius of M22 is uncommonly large for a Milky Way 
globular cluster, namely, ~1.24 pc (ref. 11). These sources are well 
inside the cluster core, at projected radii of 0.4pc and 0.25 pe for 
M22-VLA1 and M22-VLA2, respectively. The next source of com- 
parable flux density is far outside the core, at a projected radius of 
2.4pc. These sources have no counterparts in shallow archival 
Chandra X-ray imaging. On the basis of these non-detections, the 
sources are constrained to have X-ray luminosities Lx $2.2 x 10*° erg 
s | over 3-9 keV at the distance of M22”. The radio luminosities of the 
sources at 8.4GHz are Lp ~ 6 X 10°’ ergs ', assuming flat spectra. 
Therefore, if the sources are not variable, the limit of radio to X-ray 
luminosity is log (Lp/Lx) 2 —2.6. 

The radio luminosity, Lp/Lx ratio and central location of the sources 
place significant constraints on their nature. The most likely explana- 
tion is that both sources are accreting stellar-mass black holes in M22. 
Other possibilities, all of which we consider unlikely, are discussed 
in Supplementary Information. These objects are the first strong 
candidates for stellar-mass black holes in any Milky Way globular 
cluster, and the first stellar-mass black holes to be discovered through 
radio emission rather than via X-rays’’. 


The radio emission implies that the black holes are actively accret- 
ing, and the flat radio spectra are consistent with relatively low accre- 
tion rates'* (<2 —3% of the Eddington rate). Because globular clusters 
have modest amounts of interstellar gas, it is very unlikely that the 
radio luminosity can be explained by Bondi accretion. Thus the objects 
cannot be black-hole/black-hole binaries, and instead are probably in 
binary systems with Roche lobe-overflowing companions. Stellar- 
mass black holes, ~5-100 times the solar mass, Mo, offer the best 
explanation for the presence of multiple sources close to the cluster 
centre; objects more massive than the average cluster star will sink to 
the centre because of mass segregation. 

To look for optical counterparts of the radio sources, we used 
archival Hubble Space Telescope (HST) imaging of M22, for which 
photometric catalogues are available’. Figure 2 shows that M22-VLA1 
is a close match (0.05") to a moderately low-mass (~0.34 Mo) main 
sequence M dwarf in M22, as inferred from standard stellar isochrones 
(see Supplementary Information for more details). M22-VLA2 is 0.17" 


20” 


eS ve2-vaA1 


M22-VLA2 ™ 


Figure 1 | VLA radio continuum image of the core of the globular cluster 
M22. The two bright circled objects are the sources identified as stellar-mass 
black holes, M22-VLA1 and M22-VLA2. These sources have flux densities of 
55-58 yy at 5.9 GHz. We obtained the data in two separate 1-GHz-wide 
basebands centred at 5 and 6.75 GHz, allowing a measurement of the spectral 
index of the radio emission between these frequencies. Both sources have flat 
radio spectra, with « = 0-0.2, assuming S, = v* (here S, is flux density, and v is 
frequency). The faint circled object is a known millisecond pulsar”’. A red cross 
marks the photometric cluster centre. 20" corresponds to approximately 0.3 pe 
at the distance of M22. The apparent elongation of the sources is due entirely to 
the elongated synthesized beam; all three circled sources are unresolved. North 
is up and east is to the left in this image. 
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Figure 2 | Optical images of M22 and the candidate companion stars to the 
radio sources. a, Ground-based image that shows the approximate location of 
the sources in the context of the star cluster. b, c, The zoomed-in location of the 
radio sources (b, M22-VLA1; ¢, M22-VLA2) on an archival high-resolution 
HST/Advanced Camera for Surveys F814W image. Each blue circle has a radius 
of 0.3” for clarity; the uncertainty in the astrometric matching of the optical and 
radio data is <0.1”. The image orientation is as in Fig. 1. (Image credit for a: D. 
Matthews/A. Block/NOAO/AURA/NSF.) 


from the nearest detected star, which is a ~0.62 Mo main sequence 
star. Considering the distribution of stars in the inner 30” of the cluster, 
the probability of a chance coincidence as close as for M22-VLA1 is 
only 2%; for M22-VLA2 it is 26%. Thus we consider the optical asso- 
ciation for source M22-VLA1 suggestive, but that for M22-VLA2 
uncertain. However, for the case of M22-VLA1, there is an additional 
complication: because the average stellar mass in the core is greater 
than that of the putative companion, the low-mass main sequence 
star would probably be exchanged out of the binary in a three-body 
interaction with another star’. On the other hand, because of the low 
central density of M22"! (<10*Mg pe °), a binary with a low-mass 
companion might survive longer than in a typical globular cluster. 
Nonetheless, it is possible that both radio sources are associated with 
low-luminosity objects below the detection limit of the HST data, such 
as white dwarfs. 

Stellar-mass black holes with accretion rates below ~2% of the 
Eddington rate’ (in the so-called low/hard state) follow an empirical 
correlation between radio and X-ray luminosity with a scatter of a 
factor of about two (ref. 17). Figure 3 shows this correlation with the 
M22 data overplotted. The radio—X-ray relation predicts an X-ray 
luminosity of 10°'-10°*ergs ‘ for this radio luminosity'*"°, above 
the completeness limit of the archival Chandra data. There are several 
plausible explanations for this discrepancy. First, there is the possibility 
of variability. The X-ray data were taken in 2005, six years earlier than 
the radio data. Field stellar-mass black holes in the low/hard state show 
substantial (typically a factor of 2-10) variability in both radio and 
X-rays”°*". Therefore, concurrent radio and X-ray data are necessary 
for precise constraints on Lp/Lx. We found marginal evidence for radio 
variability in M22-VLA2 on the timescale of a week; more details can 
be found in Supplementary Information. Another plausible explana- 
tion is that there is larger scatter in the radio—X-ray correlation at very 
low accretion rates. Only a single known black-hole binary has a mea- 
sured radio luminosity as faint as our sources’*, and there is evidence 
that some stellar-mass black holes with low X-ray luminosities may not 
fall on the correlation”. 

An intriguing possibility is that these sources have high values of 
Lp/Lx because they are more massive than typical stellar-mass black 
holes in the field. The radio—X-ray correlation for stellar-mass black 
holes is a special case of a ‘fundamental plane’ for black-hole accretion 
in the low/hard state that includes the black-hole mass as a third 
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Figure 3 | Radio-X-ray correlation for stellar-mass black holes. The M22 
sources have properties more consistent with black holes than with neutron 
stars or white dwarfs. Filled squares represent simultaneous radio and X-ray 
data; open squares are non-simultaneous measurements, the positions of which 
might have been affected by variability. Upper limits are also shown. Some 
objects have multiple measurements plotted that represent different phases of 
accretion. The open red circle represents both M22-VLA1 and M22-VLA2, 
which have very similar luminosities. The dotted black line represents the 
published correlation'* Lz <L}°*, normalized by a least-squares fit to the 
simultaneous detections with Lx <2 X 10° ergs~'. The dashed and dot- 
dashed blue lines show two possible radio-X-ray correlations for accreting 
neutron stars; this relation is poorly constrained by observations”. The solid 
green line shows the maximum radio continuum luminosity observed for 
accreting white dwarfs”. Neither neutron stars nor white dwarfs have 
properties consistent with the M22 radio sources. More information can be 
found in Supplementary Information. 


parameter. In this relation, more massive black holes have larger 
values of Lp/Lx. If our sources have masses of ~15-20 Mg rather than 
the 5-10 Mg typical of field stellar-mass black holes”, then their X-ray 
luminosities should be lower than predicted by the correlation in Fig. 3 
by a factor of ~2-3. It is reasonable to expect that black holes in 
globular clusters will be more massive than those in the field. Field 
black holes with measured dynamical masses are all in binary systems, 
and were probably affected by mass transfer during a common 
envelope stage that reduced the mass of the resulting black holes”. 
This need not be the case in globular clusters, because black holes 
can form as single objects or in wide binaries, and then be exchanged 
into pre-existing binaries or tidally capture companions owing to the 
high stellar densities”. Globular cluster black holes also form at 
lower metallicity than in the field, leading to less mass loss from the 
progenitor and thus more massive remnants’. 

As mentioned above, the location of stars in a cluster also gives 
information about their masses. Stellar-mass black holes will mass- 
segregate to the core of the cluster. This process can be used to roughly 
estimate their masses by assuming thermalization, for which this 
relation holds’: mgy/m, =(r¢/ rH)» where mpy and rpgy are the 
characteristic black-hole mass and radius, m, is the typical stellar mass, 
and r, is the core radius. Assuming m, = 1Mo in the segregated cluster 
core and taking the observed values of r. = 1.24 pe and rgy = 0.33 pe, 
we estimate mpy ~ 15 Mo. 

The existence of black holes in a low-density globular cluster such as 
M22 constrains the magnitude of the initial velocity kicks received by 
the black holes at birth. The current central escape velocity of M22” is 
~34kms |. This value may have been higher in the past, owing to a 
larger cluster mass and a more compact structure. Nonetheless, the 
retention of two black holes in a globular cluster with a modest escape 
velocity implies that the black holes could not have received large 
natal kicks. Large kicks are inferred for some stellar-mass black holes 
in the field’’. Low kick velocities can originate from supernovae if the 
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black-hole mass is large, or if the black holes form from direct collapse 
with no supernovae. In either case, higher black-hole masses are 
favoured. 

The presence of black holes in a globular cluster can lead to an 
expansion of the core radius through interactions between black holes 
and stars. This could explain why M22 has the fifth-largest core radius 
among luminous (22x10°Lo) Milky Way globular clusters’’. 
Additional discussion can be found in Supplementary Information. 
Most theoretical models in the literature predict that only a single black 
hole (or black-hole/black-hole binary) will survive the dynamical 
processes by which black holes mass-segregate to the cluster centre, 
form an unstable subcluster, and evaporate’**. In some cases, more 
than one black hole may temporarily survive for an additional black- 
hole relaxation time (<1 Gyr), if the extra black holes are kicked into 
orbits outside the core’. Additional discussion can be found in 
Supplementary Information. 

In contrast to these’** theoretical predictions, M22 contains more 
than one black hole. In fact, it is possible that more than two black 
holes are present in M22, either as single black holes or in binary 
systems that are not undergoing observable mass transfer. Under the 
uncertain assumption that both of the M22 sources are black-hole/ 
white-dwarf binaries, published calculations can be used to estimate 
the fraction of surviving black holes that are actively accreting in 
present-day globular clusters”. Over 10 Gyr, 2-40% of black holes 
are expected to become members of binary systems with observable 
accretion. Our two observed sources thus suggest a total population of 
~5-100 black holes in M22. 
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Comet-like mineralogy of olivine crystals in an 
extrasolar proto-Kuiper belt 


B. L. de Vries!, B. Acke!, J. A. D. L. Blommaert!, C. Waelkens', L. B. F. M. Waters”?, B. Vandenbusschel, M. Min®, G. Olofsson’, 
C. Dominik*’, L; Decin'’, M. J. Barlow®, A. Brandeker*, J. Di Francesco’, A. M. Glauser*-’, J. Greaves’, P. M. Harvey", 
W.S. Holland?”, R. J. Ivison’, R. Liseau'’, E. E. Pantin", G. L. Pilbratt!®, P. Royer! & B. Sibthorpe? 


Some planetary systems harbour debris disks containing planetesi- 
mals such as asteroids and comets’. Collisions between such bodies 
produce small dust particles’, the spectral features of which reveal 
their composition and, hence, that of their parent bodies. A measure- 
ment of the composition of olivine crystals (Mg>->,Fe2,SiO4) has 
been done for the protoplanetary disk HD 100546 (refs 3, 4) and 
for olivine crystals in the warm inner parts of planetary systems. 
The latter compares well with the iron-rich olivine in asteroids*® 
(x~ 0.29). In the cold outskirts of the B Pictoris system, an analogue 
to the young Solar System, olivine crystals were detected’ but their 
composition remained undetermined, leaving unknown how the 
composition of the bulk of Solar System cometary olivine grains 
compares with that of extrasolar comets*”. Here we report the detec- 
tion of the 69-micrometre-wavelength band of olivine crystals in the 
spectrum of  Pictoris. Because the disk is optically thin, we can 
associate the crystals with an extrasolar proto-Kuiper belt a distance 
of 15-45 astronomical units from the star (one astronomical unit is 
the Sun-Earth distance), determine their magnesium-rich composi- 
tion (x = 0.01 + 0.001) and show that they make up 3.6 + 1.0 per cent 
of the total dust mass. These values are strikingly similar to those for 
the dust emitted by the most primitive comets in the Solar System*”, 
even though f Pictoris is more massive and more luminous and has a 
different planetary system architecture. 

The olivine crystals found in the Itokawa asteroid and in ordinary 
chondrites (types 4 to 6) have an iron-rich composition® (x ~ 0.29). In 
contrast, laboratory measurements of olivine crystals from unequili- 
brated bodies such as comet 81P/Wild 2 and cometary interplanetary 
dust particles show that these crystals have a range of compositions, 
but the distribution has a pronounced and sharp peak at the almost- 
pure magnesium-rich composition with x~ 0.01 (refs 8, 9). Both 
laboratory experiments"! and observations** show that crystal forma- 
tion in protoplanetary disks by gas-phase condensation, thermal 
annealing and shock heating results in magnesium-rich crystalline 
olivine’*’® (x<0.1). During the protoplanetary disk phase, these 
olivine crystals are incorporated into planetesimals. An example of a 
planetary system in which the olivine crystals are then freed from such 
planetesimals by collisions is the system of  Pictoris. This system is a 
young (~12 Myr) analogue to the Solar System, with at least one planet 
at a distance of ~10 au and a dusty debris disk containing small dust 
grains”’”°° (Fig. 1). 

We have detected (Fig. 1) the 69-11m spectral band of small (~2-1m; 
see Supplementary Information), crystalline olivine grains in 
the planetary system of f Pictoris using Herschel*? PACS”. The 
69-u1m band is of special interest because its exact peak wavelength 


and width are sensitive to both the grain temperature and, in particu- 
lar, the composition of the olivine crystals~*** (Fig. 2). From our model 
fitting of the 69-um band and spectral bands at shorter wavelengths 
(Fig. 1), the temperature and total mass of the crystals are determined 
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Figure 1 | Photometric and spectral observations of the planetary system of 
B Pictoris. a, Resolved surface brightness map of the B Pictoris debris disk at 
70 um taken with the Herschel Space Observatory’s*' Photodetector Array 
Camera and Spectrometer” (PACS). The disk is barely resolved with PACS, 
which has a point-spread function with a full-width at half-maximum of 8.2’ 
(hatched circle). b, Spitzer Space Telescope infrared spectrograph spectrum 
showing prominent olivine features’ (solid grey). The white solid line is our best 
model fit and the grey dashed line is the continuum. The uncertainties (1c) in 
the Spitzer data’ are indicated in the figure. c, The flux-corrected PACS 
spectrum with error bars (1) showing the 69-1m band of crystalline olivine 
(solid grey; 120 detection). The white solid line shows the model fit to the 69- 
um band of crystalline olivine as described in Supplementary Information, and 
the dashed grey line shows the underlying dust continuum. The best model 
contains crystalline olivine (Mg>—2,Fe2,SiO4) with x = 0.01 + 0.001 (10) anda 
temperature of 85 + 6 K (lo). 
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Figure 2 | Diagram demonstrating the dependence of the 69-um band on 
grain temperature and composition. The diagram gives the width and central 
wavelength of the 69-um band for six temperatures and for crystalline olivine 
(Mg>~2,Fe2,SiO4) with x = 0.0 (black), x = 0.01 (red) and x = 0.02 (green). 
The width and central wavelength are measured by fitting Lorentzian profiles to 
laboratory measurements” of crystalline olivine at different temperatures 
and compositions (see Supplementary Information for additional 
information). The width and wavelength positions measured show how the 
band broadens and shifts as a function of temperature or iron content. The best 
model fit of the 69-1m band of f Pictoris is indicated with a solid blue dot 
(Fig. 1b, c, white solid line); that is, the olivine crystals are cold (85 + 6 K) and 
contain about 1% iron (x = 0.01 + 0.001). 


to be 85 + 6 K and (2.8 + 0.8) X 10°’ g, respectively. The exact wave- 
length position of the 69-jm band indicates very magnesium-rich 
crystalline olivine (x = 0.01 + 0.001 (1¢)). The fraction of olivine crys- 
tals to the total amount of dust (obtained from the spectral energy 
distribution; see Supplementary Information) is 3.6 + 1.0% (1c). The 
temperature of 85 + 6 K (1¢) places the population of olivine crystals 
between 15 and 45 au from the central star, which coincides with a 
strong increase in surface density in the disk**. This location is outside 
the snow line of the system, where icy, comet-like bodies can exist, such 
as in the Kuiper belt of the Solar System. Scaling the distances in the 
B Pictoris system to those of the Solar System according to the different 
luminosities of the two central stars, the extrasolar Kuiper belt of 
B Pictoris reaches into the temperature range of the Jupiter-Saturn 
region. We propose that this location is an inward extension of what 
will in time become an analogue of the Kuiper belt of the Solar System. 

The composition of the olivine crystals around f Pictoris is strik- 
ingly similar to that found in cometary bodies in the Solar System. 
From the low iron content, we can conclude that the olivine crystals we 
observe in £ Pictoris come from collisions between unequilibrated, 
relatively small (<10-km) comet-like bodies’. The magnesium-rich 
olivine crystals around  Pictoris are in stark contrast to the iron-rich 
crystalline olivine’ (x = 0.29) found in asteroid-like bodies in the Solar 
System. When we compare the crystalline olivine abundance found in 
B Pictoris (3.6 + 1.0%) to that of primitive comets in the Solar System, 
similar low values are found. The comets 17P/Holmes and 73P/ 
Schwassmann-Wachmann, for example, contain about ~2-10% crys- 
talline olivine compared with the total amount of dust'®*°’’, Because 
olivine crystals can be formed only within 10 Au of the central star'7* 
there must have been a transportation mechanism to bring these crys- 
tals to Kuiper belt distances. Studies of crystalline material and gas 
have indeed shown that radial mixing has taken place in both the Solar 
System and disks around young stars**”’. Models are able to predict 
crystalline olivine abundances of 2-58% at radii beyond 10 Au on 
timescales of ~1 Myr (refs 12, 30). The similar crystalline olivine 
abundances in f Pictoris and Solar System comets suggest that radial 
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mixing must have been at work during the formation of the  Pictoris 
planetary system, with an efficiency similar to that in the protosolar 
nebula. 
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Stabilizing Rabi oscillations in a superconducting 
gubit using quantum feedback 


R. Vijay’, C. Macklin’, D. H. Slichter’}, S. J. Weber’, K. W. Murch’, R. Naik’, A. N. Korotkov* & I. Siddiqi’ 


The act of measurement bridges the quantum and classical worlds 
by projecting a superposition of possible states into a single (proba- 
bilistic) outcome. The timescale of this ‘instantaneous’ process can 
be stretched using weak measurements’”, such that it takes the form 
of a gradual random walk towards a final state. Remarkably, the 
interim measurement record is sufficient to continuously track and 
steer the quantum state using feedback**. Here we implement 
quantum feedback control in a solid-state system, namely a super- 
conducting quantum bit (qubit) coupled to a microwave cavity’. A 
weak measurement of the qubit is implemented by probing the 
cavity with microwave photons, maintaining its average occupation 
at less than one photon. These photons are then directed to a high- 
bandwidth, quantum-noise-limited amplifier’®"', which allows real- 
time monitoring of the state of the cavity (and, hence, that of the 
qubit) with high fidelity. We demonstrate quantum feedback control 
by inhibiting the decay of Rabi oscillations, allowing them to persist 
indefinitely’*. Such an ability permits the active suppression of deco- 
herence and enables a method of quantum error correction based 
on weak continuous measurements’**. Other applications include 
quantum state stabilization*”’’, entanglement generation using 
measurement”, state purification’’ and adaptive measurements'*””. 

Feedback protocols in classical systems, from antilock brakes to pace- 
makers, use the outcome of a measurement to stabilize the system about 
a desired state. The operation of such feedback protocols is predicated on 
the idea that measurement does not alter the state of the system. This is 
no longer true in quantum mechanics, where measurement is necessarily 
invasive’. In the Copenhagen interpretation, a quantum object can exist 
simultaneously in more than one eigenstate of the measurement operator 
until observed—Schrédinger’s celebrated “dead-and-alive’ cat being 
the quintessential hypothetical example”. The reality of the situation 
is established by the act of measurement, which forces the system 
‘instantaneously’ into one of these eigenstates in a probabilistic fashion 
(the ‘measurement back-action’). Therefore, this back-action must be 
accounted for when developing a feedback protocol to stabilize a 
quantum system, such as a qubit. 

One solution is to use weak measurements’”, where the rate (meas) at 
which information is extracted is deliberately limited, thereby slowing 
down the qubit’s random walk towards an eigenstate. Integral to this 
scheme is a detector with efficiency 11get = Fymeas/Iy ~ 1, where I’, is the 
ensemble-averaged dephasing rate due to measurement back-action”’. 
The high detector efficiency allows us to track the qubit continuously, 
and steer it to a desired state using real-time feedback. 

The experimental set-up is shown in Fig. 1. Our quantum system 
(Fig. 1b) is an anharmonic oscillator realized by a capacitively shunted 
Josephson junction, dispersively coupled to a three-dimensional 
microwave cavity”. We use its two lowest energy levels to form a qubit 
(transmon”’) with a transition frequency of @)/2m = 5.4853 GHz. The 
cavity resonant frequency with the qubit in the ground state is «,/ 
2n = 7.2756 GHz. The strongly coupled output port sets the cavity line- 
width «/2n = 13.4 MHz, and control and measurement signals are 
injected via the weakly coupled input port (Fig. la, b). The qubit-cavity 


coupling results in a state-dependent phase shift (Ad = 2tan” '(2y/x) 
= 12°, y/2m = 0.687 MHz) of the cavity output field’**, with the state 
information contained in one quadrature of the signal. The cavity 
output is sent to a near-noiseless (act ~ 1) phase-sensitive parametric 


Rabi reference 
3.0 MHz 


Signal generation i 
Rabi drive 


Read-out Digitizer/ 


computer 


cavity 


Figure 1 | Experimental set-up. a, Signal generation set-up. One generator 
provides the Rabi drive at the a.c. Stark-shifted qubit frequency («@; —2yf), 
and the output of another generator at 7.2749 GHz is split to create the 
measurement signal, paramp drive and local oscillator. The relative amplitudes 
and phases of these three signals are controlled by variable attenuators and 
phase shifters (not shown). J, in-phase component; Q, quadrature component; 
LO, local oscillator; RF, radio frequency. b, Simplified version of the cryogenic 
part of the experiment; all components are at 30 mK (except for the high- 
electron-mobility transistor (HEMT) amplifier, which is at 4 K). The combined 
qubit and measurement signals enter the weakly coupled cavity port, interact 
with the qubit and leave from the strongly coupled port. The output passes 
through two isolators (which protect the qubit from the strong paramp drive), 
is amplified and then continues to the demodulation set-up. The coherent state 
at the output of the cavity for the ground and excited states is shown 
schematically before and after parametric amplification. c, d, The amplified 
signal is homodyne-detected and the two quadratures are digitized (c). The 
amplified quadrature (Q) is split offand sent to the feedback circuit (d), where it 
is multiplied with the Rabi reference signal. The product is low-pass-filtered 
and fed back to the JQ mixer in a to modulate the Rabi drive amplitude. 
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amplifier’®" (paramp), which boosts the relevant quadrature to a level 
compatible with classical circuitry. The paramp output is further amp- 
lified and homodyne-detected (Fig. 1c) such that the amplified quad- 
rature (Q) contains the final measurement signal. 

We obtain Rabi oscillations with the cavity continuously excited at 
@,/2m = 7.2749 GHz (@, ~ ©. — x) with a mean cavity photon occu- 
pation (7) that controls the measurement strength (see Supplementary 
Information, section II, for calibration of 7). The Rabi drive at the a.c. 
Stark-shifted” qubit frequency (co; —2y/) is turned on for a fixed 
duration, t,,. The amplitude is adjusted to yield a Rabi frequency of 
Qp/21 = 3 MHz. First we average 10* measurement traces to obtain a 
conventional ensemble-averaged Rabi oscillation trace (Fig. 2a). Even 
though the qubit is continuously oscillating between its ground and 
excited states, the oscillation phase diffuses, primarily owing to measure- 
ment back-action. As a result, the averaged oscillation amplitude decays 
over time, but the frequency domain response retains a signature of these 
oscillations’®. We Fourier-transform the individual measurement traces 
and plot the averaged spectrum (Fig. 2b, blue trace). A peak, centred at 
3 MHz and with a full-width at half-maximum of /7/21, is observed and 
remains unchanged even when T,, is much longer than the decay time of 
the ensemble-averaged oscillations. A plot of J/2n for different mea- 
surement strengths (in units of 7) is shown in Fig. 2c. As expected in the 
dispersive regime, I and 7 are linearly related’. The vertical offset is 
dominated by pure environmental dephasing, I”.,,,/27, but has contri- 
butions from qubit relaxation (T;) and thermal excitation into higher 
qubit levels; more details can be found in Supplementary Information, 
sections II and IV(C). 

The ratio of the height of the Rabi spectral peak to the height of the 
noise floor has a theoretical maximum value of four’, corresponding 
to an ideal measurement with overall efficiency 7 = 1. For our set-up, 
this efficiency can be separated into two contributions as 4 = NaetMenv- 
The detector efficiency is given by get = (1 +2Maaa) *, with noaa being 
the number of noise photons added by the amplification chain. The 


added noise is referenced to the output of the cavity and includes the 
effect of signal attenuation between the cavity and the paramp. The effect 
of environmental dephasing, I’.n,y, is modelled using Neny = (1+ LD enyv/ 
I,) ‘. The best measurement efficiency we obtain experimentally is 
ny = 0.40, with nace = 0.46 and nen, = 0.87; more details can be found 
in Supplementary Information, section III. 

We now discuss the quantum feedback protocol, which is motivated 
by the classical phase-locked loop used for stabilizing an oscillator. The 
amplified quadrature is multiplied by a Rabi reference signal with fre- 
quency (2o/2n = 3 MHz using an analogue multiplier (Fig. 1d). The 
output of this multiplier is low-pass-filtered and yields a signal propor- 
tional to the sine of the phase difference, 0... between the 3-MHz 
reference and the 3-MHz component of the amplified quadrature. 
This ‘phase error’ signal is fed back to control the Rabi frequency Qa 
by modulating the Rabi drive strength with an upconverting JQ mixer 
(Fig. la). The amplitude of the reference signal controls the dimension- 
less feedback gain, F, through the expression Qg/Qrz = — Fsin(Oerr); 
where Qp, is the change in Rabi frequency due to feedback. Figure 2d 
shows the ensemble-averaged, feedback-stabilized oscillation, which 
persists for much longer than the original oscillation in Fig. 2a. In fact, 
within the limits imposed by our maximum data acquisition time of 
20 ms, these oscillations persist indefinitely. The red trace in Fig. 2b 
shows the corresponding averaged spectra. The needle-like peak at 
3 MHz is the signature of the stabilized Rabi oscillations. 

To confirm the quantum nature of the feedback-stabilized oscillations, 
we perform state tomography on the qubit”®. We stabilize the dynamical 
qubit state, stop the feedback and Rabi driving after a fixed time (80 [1s + 
Ttomo after starting the Rabi drive), and then measure the projection of 
the quantum state along one of three orthogonal axes. This is done using 
strong measurements (by increasing 7) with high single-shot fidelity"’. 
This allows us to remove any data points where the qubit was found in 
the second excited state (Supplementary Information, section IV(C)). 
By repeating this many times, we can determine (cx), (ay) and (az), 
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Figure 2 | Rabi oscillations and feedback. a, We average 10* measurement 
traces using weak continuous measurement and simultaneous Rabi driving to 
obtain ensemble-averaged Rabi oscillations that decay in time as a result of 
ensemble dephasing. b, Averaged Fourier transforms of the individual 
measurement traces from a. The spectrum shows a peak at the Rabi frequency 
(blue trace) with a full-width at half-maximum of //2n. The grey trace shows 
an identically prepared spectrum for the squeezed quadrature (multiplied by 20 
for clarity), which contains no qubit state information. c, //27 plotted as a 
function of cavity photon occupation, / (measurement strength), showing the 
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expected linear dependence. The vertical offset is dominated by pure 
environmental dephasing, [°-n/21, but has contributions from qubit relaxation 
(T,) and thermal excitation into higher qubit levels. d, Feedback-stabilized, 
ensemble-averaged Rabi oscillations, which persist for much longer times than 
those without feedback (a). The corresponding spectrum, shown in b, has a 
needle-like peak at the Rabi reference frequency (red trace). The slowly 
changing mean level in the Rabi oscillation traces in a and d is due to the 
thermal transfer of population into the second excited state of the qubit. See 
Supplementary Information, section IV(C), for more details. 
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the three components of the Bloch vector for the ensemble qubit state. 
Figure 3a shows a plot of the Bloch vector components for different 
time points (Tomo) over one oscillation period (27). The Y and Z 
components are well fitted by a sinusoidal function, whereas the X 
component is nearly zero as expected for a coherent Rabi oscillation 
about the X axis. The imperfect efficiency of the feedback process is 
reflected in the non-unit amplitude of these oscillations. This feedback 
efficiency, D, is given by the time-averaged scalar product of the 
desired and actual state vectors on the Bloch sphere (Supplementary 
Information, section IV(A)). In our experiment, the measurement is 
weak enough that the stabilized oscillations are sinusoidal and D is 
approximately equal to the amplitude of these oscillations. 

In Fig. 3b, we plot D (red squares) versus the dimensionless feedback 
gain, F. We find a maximum value of D = 0.45 for the optimal choice of 
F. Ideally, feedback efficiency improves with measurement strength 
(Nenv > 1) but requires correspondingly larger feedback loop bandwidth. 
Thus, in the presence of finite feedback bandwidth and loop delay, 
there exists an optimal measurement strength, which for our experi- 
ment was I),/2% = 0.134 MHz. The dashed black line in Fig. 3b is a plot 
of the theoretical expression for D, given by 
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Figure 3 | Tomography and feedback efficiency. a, Quantum state 
tomography of the feedback-stabilized state. We plot (ax), (ay) and (az) for 
different time points T,,m in one full Rabi oscillation of the qubit. The solid 
lines are sinusoidal fits. The magnitude of these sinusoidal oscillations is 
approximately equal to the feedback efficiency, D = 0.45. b, D plotted as a 
function of the dimensionless feedback gain, F. Solid red squares show 
experimental data with a maximum value of D = 0.45. The dashed black line is 
a plot of equation (1) with 7 = 0.40 and 7/2n = 0.154 MHz (7=0.47, DPeny/ 
2n = 0.020 MHz), whereas the solid black line is obtained from full numerical 
simulations of the Bayesian equations including finite loop delay (250 ns) and 
feedback bandwidth (10 MHz). 
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p=2(“ + F ) (1) 
and is derived using a simple analytical theory based on the Bayesian 
formalism for the qubit state trajectory (Supplementary Information, 
section IV(A)). This expression does not account for finite feedback 
bandwidth, loop delays in the circuit or qubit relaxation. The maxi- 
mum value, Dmax = 4/1, is obtained for an optimal feedback gain of 
Fopt = Jnl /Qo. A value of Dax < 1 implies that the stabilized state is 
a mixed state; this occurs for 7 <1, implying that we have incom- 
plete information about the qubit state. To account for the finite loop 
delay (250ns), feedback bandwidth (10 MHz) and qubit relaxation 
(T, = 20 us), we performed full numerical simulations of the Bayesian 
equations for qubit evolution (Supplementary Information, section 
IV(B)). The results are shown as a black solid line in Fig. 3b and agree 
well with our experimental data. 

In our experiment, even though the system being controlled is 
quantum and subject to measurement back-action, we essentially treat 
it as a classical oscillator and successfully apply a feedback protocol 
based completely on classical intuition. This can be done because the 
feedback signal achieves near-perfect cancellation of the random mea- 
surement back-action for optimal F. Although it is not true in general, 
in this particular scheme improvements in the feedback efficiency from 
full reconstruction of the quantum state* are small. Furthermore, it is 
possible to approach a pure state with D = 1 by ensuring that 7 = 1 and 
eliminating feedback loop delay. 

We have demonstrated a continuous analogue feedback scheme to 
stabilize Rabi oscillations in a superconducting qubit, allowing them to 
persist indefinitely. The efficiency of the feedback is limited primarily 
by signal attenuation and loop delay, and could be improved in the 
near future with the development of on-chip paramps and cryogenic 
electronics to lessen the effects of attenuation and delay, respectively. 
Weanticipate that our present technology can be extended to entangled 
qubits to provide another route to quantum error correction based on 
weak continuous measurements’*”*. Such methods might be advant- 
ageous in architectures where strong measurements can cause qubit 
state mixing”. This development may be the start of a new era of 
measurement-based quantum control for solid-state quantum 


information processing*”">””. 


METHODS SUMMARY 


The transmon qubit was fabricated on a bare, high-resistivity Si wafer using 
electron-beam lithography and double-angle aluminium evaporation with an 
intervening oxidation step. The qubit is a single Josephson junction connecting 
two rectangular paddles (420 tum X 600 jum) that provide the shunting capacitance 
and coupling to the cavity. The cavity was machined out of 6061 aluminium alloy. 
The quality factor of the cavity was adjusted by controlling the length of the centre 
conductor of the SMA coaxial connector protruding into the cavity volume. These 
lengths were chosen to give strong coupling at the output port and weak coupling 
at the input port, resulting in a net power transmission on resonance of —20 to 
—30 dB. Qubit rotations around the X and Y axes of the Bloch sphere for state 
tomography were performed using resonant microwave pulses. The strong mea- 
surement used in state tomography was implemented with a 800-ns read-out pulse 
with an amplitude corresponding to a mean cavity occupation of 7~11. 
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Sulphate-climate coupling over the past 300,000 


years in inland Antarctica 


Yoshinori lizuka’, Ryu Uemura’, Hideaki Motoyama’, Toshitaka Suzuki*, Takayuki Miyake*+, Motohiro Hirabayashi? 


& Takeo Hondoh! 


Sulphate aerosols, particularly micrometre-sized particles of sulphate 
salt and sulphate-adhered dust, can act as cloud condensation nuclei, 
leading to increased solar scattering that cools Earth’s climate’. 
Evidence for such a coupling may lie in the sulphate record from 
polar ice cores, but previous analyses of melted ice-core samples have 
provided only sulphate ion concentrations, which may be due to 
sulphuric acid’. Here we present profiles of sulphate salt and 
sulphate-adhered dust fluxes over the past 300,000 years from the 
Dome Fuji ice core in inland Antarctica. Our results show a nearly 
constant flux of sulphate-adhered dust through glacial and inter- 
glacial periods despite the large increases in total dust flux during 
glacial maxima*. The sulphate salt flux, however, correlates 
inversely with temperature, suggesting a climatic coupling between 
particulate sulphur and temperature. For example, the total 
sulphate salt flux during the Last Glacial Maximum averages 
5.78mgm “yr, which is almost twice the Holocene value. 
Although it is based on a modern analogue with considerable 
uncertainties when applied to the ice-core record, this analysis 
indicates that the glacial-to-interglacial decrease in sulphate would 
lessen the aerosol indirect effects on cloud lifetime and albedo, 
leading to an Antarctic warming of 0.1 to 5 kelvin. 

The climate of the past 430 kyr is guided by large-amplitude 100-kyr 
glacial-interglacial cycles°. Such cycles are triggered by the Earth’s 
orbital changes and then amplified by various land—atmosphere inter- 
actions’. Of these, the influence of aerosol radiative forcing’ is particu- 
larly hard to quantify accurately. The low aerosol concentrations of the 
southern high-latitude region make the direct effect of aerosol radi- 
ative forcing negligible, but the indirect effect, in which the aerosol acts 
as cloud condensation nuclei (CCN), could be meaningful®’. Many 
CCN consist of sulphate salt, either by itself or mixed with silicate 
materials® (Supplementary Fig. 1), which after precipitating to the 
ground can be preserved in the ice. 

Aerosols in the Antarctic region contain little anthropogenic and 
ammonium sulphates’, instead being dominated by sulphuric acid 
(H,SO,4), sodium sulphate (Na2SO,) and calcium sulphate’ (CaSO,). 
The Na2SO, arises mostly from the reaction of sodium chloride (NaCl) 
with H,SO, that comes mainly from marine biological activity'®. The 
CaSO, arises from terrestrial gypsum and also from a reaction in 
aerosol between H2SO, and calcium carbonate”’. These salts primarily 
form during their transport through the atmosphere". Once in the ice, 
sulphate salts are good proxies for past atmospheric chemistry because, 
unlike the more mobile, volatile compounds such as nitric acid’? 
(HNO3) and liquid H,SO, (ref. 8), they are largely unaffected by 
post-depositional processes. 

Analyses*’? of the sulphate ion (SO,°_) record in Antarctic ice cores 
show that almost all non-seasalt sulphate ions in inland Antarctic ice 
cores come from marine biological activity'’. However, because the ice 
samples were melted before measurement, those studies could not 
definitively separate the H,SO, flux from the sulphate salt (Na2SO,4 


and CaSQ,) flux. Even less is known about how much of the sulphate 
salt mixed with silicate mineral dust. Here we present a complete 
record of the sulphate salt and sulphate-adhered dust fluxes into inland 
Antarctica over the past 300 kyr. 

To determine the sulphate salt and seasalt fluxes (by sea salt we 
mean only NaCl), we combined two methods: an ion analysis method’, 
here applied to the 300-kyr record in the Dome Fuji ice core, and an 
elemental and compositional analysis of single particles extracted from 
the same ice record, using a sublimation method'?™. 

We found that particles (Supplementary Fig. 2) in glacial inceptions 
and interglacial periods can be distinguished from those in glacial 
maxima by their Na2SO,/CaSO, and NaCl/Na,SO4 mass ratios. 
During glacial maxima, Na,SO,/CaSO, values are particularly low, 
whereas NaCl/Na2SO, values are high (Fig. 1). With a squared cor- 
relation coefficient of r = 0.76 and a regression slope close to 1 
(0.85 + 0.17 in Supplementary Fig. 3), the Na,SO,/CaSO, mass ratio 
correlates well with that? from the ion chromatograph analyses. 
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Figure 1 | Oxygen isotope and salt mass ratios of the past 300 kyr (before 
AD 2000) in the Dome Fuji ice core. a, Oxygen isotope ratio of ice (expressed 
as 8/80 = (80/ 0) sairepte/ (780/'°O)standard — 1, where the standard is Vienna 
Standard Mean Ocean Water) from ref. 27 (n = 691; uncertainty, <0.1%0o). 
Grey bars indicate glacial maxima (Marine Isotope Stages 2, 4, 6b, 6d, 6f and 
8d). b, Na2SO,4/CaSO, salt mass ratio from the energy-dispersive spectroscopy 
(EDS) method (single-particle analysis) applied to 38 filter samples (solid 
circles) and measured from ion concentrations” of 691 melt samples (line). 
Vertical error bars are uncertainties (coefficient of variation) in the EDS ratios. 
Numerical values are in Supplementary Information. The uncertainty in the 
ion-deduced ratios ranges from 36 to 49% for average glacial and interglacial 
conditions. c, Same as b except data shows the NaCl/Na2SO, salt mass ratio. 
The uncertainty in the ion-deduced ratios ranges from 34 to 19% for average 
glacial and interglacial conditions. 
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Because the ion-deduced CaSO, flux is well confirmed from the rela- 
tion between Ca*~ and particle components!*"> in the Dome Fuji ice 
core, we can confidently use the ion-deduced Na,SO, and CaSO, 
fluxes. The regression slope of NaCl/Na,SO, equals 0.84 (Supplemen- 
tary Fig. 3), which, considering the Na2SO,/CaSO, slope of 0.85, indi- 
cates that the ion chromatograph method underestimated the NaCl 
flux by up to 29% (1 — 0.85 X 0.84). This underestimate may be due to 
a poor understanding of mixed nitrate and chloride salt formation’. 
Nevertheless, the high correlation (1° = 0.73) suggests that the NaCl 
variations during the 300-kyr time series can be reconstructed well. 

The resulting ion-chromatograph-derived CaSO, and NaCl fluxes 
to the ground surface vary greatly over the past 300 kyr, both showing 
high values during glacial maxima (Fig. 2). The high CaSO, flux agrees 
with the high dust flux from South America during these stages*'®, 
which suggests that the higher dust flux increases Ca” sulphatization 
and thus consumes more H2SO, in the atmosphere”. 

During interglacials, both the NaCl flux and the Na,SO, flux are 
low, probably because the reaction between NaCl and H,SO, to pro- 
duce Na SO, is limited by the low NaCl flux*!”. Moreover, the NaCl/ 
Na2SO, ratio is low during interglacials and glacial inceptions but high 
in glacial maxima (Fig. 1), a trend predicted by the reaction between 
NaCl and H2SO, (ref. 17). A low ratio in interglacials and glacial 
inceptions indicates excess H,SO,4 in the Antarctic atmosphere 
(Fig. 2), causing the sulphate salt flux to be controlled mainly by seasalt 
flux®. However, a high ratio in glacial maxima indicates excess seasalt 
flux’, causing the sulphate salt flux to be controlled mainly by H,SO,4 
flux. Consistent with this interpretation, we measured very little 
H,SO, during glacial maxima (Fig. 2), as nearly all SO,’~ was asso- 
ciated with sulphate salt. In contrast, during interglacial periods and 
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glacial inceptions, the H2SO, contribution is relatively high, meaning 
that the SO,’ flux is not a good measure of the sulphate salt flux 
during these periods. 

Wealso estimated the flux of sulphate-adhered dust, that is, particles 
with both silicon and sulphate. Such particles probably act as CCN®'® 
and thus influence climate. We refer to the collection of all particles 
containing silicon as the ‘total dust’, which is often measured in ice- 
core studies*’®. Because the total dust flux correlates to the non-seasalt 
calcium ion flux**, we examined the time variation of the ratio of 
sulphate-adhered dust to total dust in comparison with the ratio of 
sulphate salt flux to non-seasalt calcium ion flux (Supplementary Fig. 4). 
These ratios are correlated, both being high during glacial inceptions but 
low during glacial maxima. We use this correlation together with the 
measured total-dust flux from the melt samples to obtain the sulphate- 
adhered dust flux. 

The sulphate-adhered dust flux averages 0.10 mgm * yr ‘ through 
glacial and interglacial periods (Fig. 3a). This constant trend occurs 
despite the total dust flux increasing by a factor of 50 (up to ~5 mg 
m “yr ') between interglacials and glacial maxima. Because of the 
constant trend, no significant correlation occurs between temperature 
and sulphate-adhered dust flux, suggesting that the sulphate-adhered 
dust has little influence on the glacial and interglacial temperature 
changes via the indirect effect of CCN. 

In contrast to that of the sulphate-adhered dust, the sulphate salt 
flux (another measure of CCN) correlates inversely with 5/80, a tem- 
perature proxy (Fig. 3b). This correlation suggests a coupling between 
temperature and particulate sulphur (through Na,SO, and CaSQ,). 
The higher sulphate salt flux at lower temperatures also contrasts with 
the SO,’~ flux, which shows no clear correlation with temperature’. 
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Figure 2 | Sulphate ion and salts fluxes. a, Isotopic data as in Fig. la. Marine 
Isotope Stages 1-8d are indicated at top. Open boxes (1, 5e and 7e) are 
interglacials, arrows (5d and 7d) are glacial inceptions and filled boxes (2, 4, 6b, 
6d, 6f and 8d) are glacial maxima. b, NaCl flux from ion concentration 
measurements” after being verified by the single-particle (EDS) measurements. 
The uncertainty in the NaCl flux is 17% for average glacial conditions and 10% 
for interglacial conditions. c, Cumulative salt fluxes from ion concentration 
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measurements” after being verified by EDS measurements. Na2SO,, blue; 
CaSOx, orange. The Na2SO, flux uncertainty ranges from 17 to 10% for average 
glacial and interglacial conditions; the corresponding values for CaSO, range 
from 19 to 39%. d, The sulphate ion (SO,”~ ) flux’”. The uncertainty in the 
SO,’” flux ranges from 17 to 5.7% for average glacial and interglacial 
conditions. e, Fractional contribution of sulphuric acid (H2SO,) and sulphate 
salt (NazSO, plus CaSO,) to the total sulphur flux. 
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Figure 3 | Correlations of relevant fluxes to the temperature proxy 5'°O. 
a, Sulphate-adhered and total dust fluxes**. The uncertainty in the sulphate- 
adhered dust flux ranges from 31 to 58% for average glacial and interglacial 
conditions; the corresponding values for the total dust flux are 17 to 14%. 

b, Sulphate ion and sulphate salt fluxes. The linear best-fit line for sulphate salts 
is Flux = —0.448'8O — 20.9 with r° = 0.42 (n = 691). The correlation is 
significant (P < 0.001). (For sulphate ions, a linear fit gives an 1” value of only 
0.005). The uncertainty in the sulphate ion flux ranges from 17 to 5.7% for 
average glacial and interglacial conditions; for the sulphate salt flux, the range is 
17 to 12%. 


For the relatively warm interglacial periods, including the Holocene 
epoch, the evidence here suggests that Na2SO, is generally controlled 
by the flux of sea salt from the ocean, which is relatively low. But in 
glacial maxima, the seasalt and dust fluxes increase, leading to greater 
particulate sulphur flux. (The flux of sulphate salt exceeds that of 
SO,’ because the salt has a higher molar mass.) During such cold 
periods, the amount of particulate sulphur is limited by the amount of 
SO,’ . This SO,” comes from marine biogenic sulphur’’, and marine 
biological activity thus influences the amount of sulphate salt. 

The coupling shown here between temperature and sulphate salt 
flux provides new evidence against which to test the long-debated”” 
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hypothesis of Charlson et al.*° (CLAW). They proposed that marine 
biogenic sulphur provides a negative feedback to climate change. Such 
a feedback requires the sulphur to have a controlling effect on CCN 
production and requires biogenic sulphur production to respond to 
climate. Concerning the first requirement, a recent review’? shows that 
sulphur does not control CCN in the present interglacial period. Our 
evidence from the three most recent interglacials agrees with this view 
because our data indicate that sea salt rather than marine sulphur 
controls particulate sulphate flux. However, during glacial maxima, 
the greater sulphate salt flux is determined by the marine sulphur flux, 
which agrees with the first requirement. However, the CLAW negative 
feedback requires that the greater sulphate salt flux decreases marine 
sulphur emissions as a consequence of changes in cloud albedo and 
surface temperature, which is inconsistent with our finding that SO,” 
flux does not decrease during glacial maxima (Fig. 3). Thus, the second 
CLAW requirement seems to fail even during glacial maxima. 

The correlation between temperature and sulphate salts suggests 
that the sulphate salts have an indirect aerosol effect on glacial- 
interglacial temperature changes. To obtain a rough estimate of the 
radiative forcing caused by the flux change of sulphate salt, we use recent 
radiative modelling results. To do so, we note that the ratio of sulphate 
concentration during the Holocene to that during the Last Glacial 
Maximum roughly equals the ratio of pre-industrial anthropogenic 
sulphate emissions to present anthropogenic sulphate emissions”? 
(Supplementary Fig. 5), and we assume that the sulphate salt concen- 
tration changes globally. This assumption is consistent with the finding 
that the change of dust flux in Antarctica correlates strongly with global 
dust flux’. 

The resulting model gives net sulphate-induced radiative coolings 
in the Last Glacial Maximum of —1.85Wm ~* (ref. 24) and 
—2.57Wm ° (ref. 25), producing global climate coolings of 0.1 
(ref. 24) and 2.24K (ref. 25). By applying the polar amplification 
factor*® (1.3-2.3), the resulting cooling in Antarctica ranges from 0.1 
to 5K as a result only of the indirect aerosol effect. The value 5 K is 
probably overestimated because other reports” suggest lower radiative 
cooling. Considering that the temperature change between the Last 
Glacial Maximum and the Holocene in Antarctica is ~ 8 °C (ref. 26), 
our estimate highlights the large uncertainty in the climatic impact of 
indirect aerosol effects from sulphate salts. Thus, the role of sulphate 
salts and sulphate-adhered dust in producing the indirect effect should 
be quantified using palaeoclimate models with our new aerosol data 
and other proxies (for example temperature, CO, CH, and total dust) 
archived in the ice core. 


METHODS SUMMARY 


To determine the amount of sulphate, chloride and mineral particles in the Dome 
Fuji ice core, we used a low-temperature sublimation and particle analysis 
method'!"* and compared the results with an analysis’ based on ion chromato- 
graphy data obtained previously’. That analysis determines the salt fluxes of 
CaSO,, Na2SO,4 plus MgSO,, and NaCl plus MgCl, from ion concentrations. 
Results from the two methods agreed well for CaSO, and the Na2SO,-plus- 
MgSO, fluxes (Supplementary Fig. 3). 

The sublimation method removes volatiles such as HO, HCl, HNO3 and H,SO4 
(refs 11, 14). In total, we sublimated 38 sampled sections over the 300-kyr period 
and analysed them using scanning electron microscopy/energy-dispersive 
spectroscopy. Each sample yielded several hundred particles exceeding 0.4 um 
in diameter (16,821 particles in total). To determine whether the particles had a 
mineral, a sulphate or a chloride component, we used a scheme from previous 
studies''™* that divides non-volatile particles into insoluble and soluble compo- 
nents. We assumed that sodium and sulphur ina given particle came from Na2SOu, 
whereas sodium and chlorine came from NaCl and calcium and sulphur came 
from CaSO,. Then we calculated the Na,SO,/NaCl and Na,SO,/CaSO, mass 
ratios following the method described in ref. 11. To obtain the number ratio of 
sulphate-adhered dust to total silicate dust particles, we assumed that a particle 
with both sulphur and silicon was sulphate-adhered dust and divided the number 
of such particles by the total number of particles with silicon. Supplementary 
Information lists the uncertainties. 
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Natural and anthropogenic variations in methane 
sources during the past two millennia 


G.I, Sapart!, G. Monteil!, M. Prokopiou!, R. S. W. van de Wal’, J. O. Kaplan?, P. Sperlich?, K. M. Krumhardt?, C. van der Veen!, 
SD: Houweling'*, M. C. Krol!, T. Blunier?, T. Sowers’, P. Martinerie®, E. Witrant’, D. Dahl-Jensen? & T. Réckmann! 


Methane is an important greenhouse gas that is emitted from 
multiple natural and anthropogenic sources. Atmospheric methane 
concentrations have varied on a number of timescales in the past, but 
what has caused these variations is not always well understood’ *. 
The different sources and sinks of methane have specific isotopic 
signatures, and the isotopic composition of methane can therefore 
help to identify the environmental drivers of variations in atmo- 
spheric methane concentrations’. Here we present high-resolution 
carbon isotope data (5'°C content) for methane from two ice cores 
from Greenland for the past two millennia. We find that the °C 
content underwent pronounced centennial-scale variations between 
100 Bc and AD 1600. With the help of two-box model calculations, we 
show that the centennial-scale variations in isotope ratios can be 
attributed to changes in pyrogenic and biogenic sources. We find 
correlations between these source changes and both natural climate 
variability—such as the Medieval Climate Anomaly and the Little Ice 
Age—and changes in human population and land use, such as the 
decline of the Roman empire and the Han dynasty, and the popu- 
lation expansion during the medieval period. 

In the pre-industrial period, methane (CH,) sources can be divided 
into three categories on the basis of their stable carbon isotopic 
signatures: biogenic sources (for example, wetlands, rice paddies and 
ruminants, mean 8'°C = —60 + 5%o), geological sources" (for example, 
mud volcanoes and microseepages, mean 8PC=-38+ 7%o) and 
ac sources (for example, fires, biofuel and coal burning, mean 
5'°C = —22 + 3%o)!**. The isotopic composition of CH, in the 
troposphere is affected by emissions from these sources and by isotope 
fractionation in the sink mechanisms, primarily OH’ oxidation, with 
minor contributions from soil removal and stratospheric loss’. 

Previous measurements of the 5'°C value of CH, from air trapped in 
Antarctic ice cores challenged our understanding of the behaviour of 
pre-industrial CH, sources’. They indicate that before aD 1500, the 
contribution of !°C-enriched CH, sources (for example, biomass 
burning) had to be larger than expected for pre-industrial conditions 
in order to explain the generally high 8'°C levels during this period’. 
After AD 1500, 8'°C decreased by 2%o until AD 1800, followed by an 
abrupt increase presumably caused by increased fossil fuel emissions 
associated with the onset of industrialization’*. Several hypotheses 
have been proposed to explain the 5'°C minimum around ap 1800, 
including a decline in anthropogenic biomass burning in the Americas 
concurrent with colonial expansion!, an early rise of '*C-depleted 
agricultural sources’ or a combination of both’. 

Our high-resolution 8'°C data from the NEEM (North Greenland 
Eemian Ice Drilling programme) and EUROCORE ice cores (see 
Methods) allow a more detailed reconstruction of global-scale changes 
in CH, sources over the past two millennia (Fig. 1a). Whereas the most 
distinctive feature of the isotopic record over this period is the 
minimum around aD 1800 (in agreement with previous studies’), 


our measurements also reveal three centennial-scale excursions in 
5'°C between 100 Bc and ap 1600 that were not resolved earlier 
(Fig. 1a). These excursions are superimposed on a slightly declining 
long-term trend in 8'°C, which is accompanied by a long-term 
increase in CH, mixing ratios of about 70 p.p.b. between 100 Bc and 
AD 1600 as observed in both Northern Hemisphere and Southern 
Hemisphere records*” (Fig. 1b). 

We use a two-box model (see Supplementary Information) to infer 
possible source/sink variations that are consistent with the observed 
5'°C and CH, data. Ina first step, we vary single sources individually to 
identify the most important contributors to the isotope variations. The 
results show that the long-term increase in CH, mixing ratios of 
70 p.p.b. (corresponding to a source change of 28 Tg CH, between 
100 Bc and AD 1600) cannot originate from geological and pyrogenic 
sources only, but must be primarily driven by changes in biogenic 
emissions (Supplementary Fig. 2), in agreement with recent model 
studies'*. The three centennial-scale 5'°C excursions (Fig. 1a) could 
be caused by either a relative increase in the '*C-enriched sources 
(pyrogenic or geological sources) or a decrease in the '*C-depleted 
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Figure 1 | Records of 5'°C and mixing ratio of CH, over the past two 
millennia. a, 5'°C measurements on air trapped in Greenland ice cores from 
NEEM (black diamonds; this study), EUROCORE (blue diamonds; this study), 
GISPII* (green diamonds) and Antarctic ice cores from Law Dome’ (red 
diamonds) and the WAIS divide’ (orange diamonds). (1), (2) and (3) 
correspond to the three excursions in the Northern Hemisphere 81°C record 
(see main text). b, CH, mixing ratio records from Greenland (GRIP**; black 
circles) and Antarctica (Law Dome’, red circles; WAIS’, orange circles). Each 
data point represents one measurement. Error bars represent + 1a, based on the 
reproducibility of the measurement systems. 
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sources (biogenic sources). We consider short-term fluctuations in 
geologic emissions to be unlikely, so our discussion simplifies to 
biogenic versus pyrogenic sources. The additional constraint that no 
clear corresponding signal is observed in the CH, mixing ratio implies 
that the isotope variations cannot be explained without variations in 
pyrogenic sources (Supplementary Fig. 3). Indeed, the larger the dif- 
ference between the isotopic signature of specific sources and the 
global mean, the higher its ‘isotope leverage’, that is, the more effective 
a change in this source is in changing the overall isotopic composition. 

Whereas this single-source approach is suitable for qualitatively 
identifying the main drivers of the observed variability, it does not 
help in quantifying simultaneous changes in multiple sources. Thus, 
in a second step, we mathematically solve the CH, and isotope mass 
balance equations to determine simultaneous variations in the bio- 
genic and the pyrogenic sources that can explain the measured 6'°C 
and CH, mixing ratio data (see Supplementary Information). This 
multiple-source change approach shows that the centennial-scale 
5'°C excursions must be related to increased pyrogenic emissions that 
are balanced by a concomitant decrease in biogenic emissions. 

In order to estimate the uncertainties in the reconstructed source 
variations, an error propagation study was carried out using a Monte 
Carlo approach. One thousand simulations were performed, in which 
model input parameters (observation errors, inter-hemispheric 
difference in CH, and in 8'°C, the geological source strength and 
isotope signatures of the sources) were perturbed randomly within 
their range of uncertainty (see Supplementary Information). The 
resulting source reconstructions fall within the light yellow and light 
green bands in Fig. 2. 

In order to further investigate the origin of the reconstructed varia- 
tions in CH, sources, we compared our pyrogenic and biogenic emis- 
sion scenarios to the Northern Hemisphere charcoal index"! (based on 
charcoal accumulation measurements in sediments, and thus a good 
indicator of biomass burning changes in the past), to model-derived 
fire activity’’, to the rate of deforestation (estimates based on popu- 
lation data, remote sensed images and land census)'*"*, to precipita- 
tion estimates’’, and to reconstructions of Northern Hemisphere’ and 
extratropical Northern Hemisphere (NEXT)’” temperatures (Fig. 2). 
The Northern Hemisphere charcoal index’ (Fig. 2c) shows three 
peaks in biomass burning between 100 8c and aD 1600 that qualita- 
tively coincide in time with the three excursions in 8G, but the overall 
correlation is weak. The rate of deforestation is a good indicator of 
anthropogenic variations in biomass burning because fire was the 
primary means for land maintenance and clearance (see Supplemen- 
tary Information). We calculated pyrogenic CH, emissions from fires 
used to clear land and to maintain cropland (see Supplementary 
Information and Supplementary Fig. 7). The results show that between 
100 Bc and AD 1600, human activity may have been responsible 
for roughly 20-30% of the total pyrogenic CH, emissions. The 
anthropogenic fraction increases over time, whereas natural pyrogenic 
emissions decreased. It is important to note that our estimates of the 
area under land use’? include intensification of land use due to new 
agricultural technologies and population pressure in the last centuries. 
Therefore the overall land use area increased more slowly than popu- 
lation increase in the recent past. Other estimates of anthropogenic fire 
activity'”"* are based on constant per capita land use area. Using these 
data, CH, would scale directly with population, resulting in lower CH, 
emissions from human activity in the past. 

The values of 5'°C are high at the beginning of our record and 
decrease around AD 200 (Fig. 1), which results in a decrease of pyrogenic 
sources and an increase in biogenic sources in our source reconstruction 
(Fig. 2a, b). Both the Northern Hemisphere charcoal index (Fig. 2c) and 
the NEXT temperature reconstruction’” (Fig. 2f) show decreasing 
values during this period, for which data on fire activity and precipita- 
tion are not available. The decrease in NEXT temperatures could 
have led to lower CH, emissions from wildfires, but a causal link to 
increasing wetland emissions is less obvious. The data on the rate of 
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Figure 2 | Reconstructed scenarios for pyrogenic and biogenic CH, 
emissions and other palaeoproxies between 100 Bc and aD 1600. 
Uncertainty envelopes for the source reconstructions are derived from random 
parameter perturbations in the source inversion (see Supplementary 
Information). a, b, Pyrogenic (a) and biogenic (b) CH, emissions. c, Northern 
Hemisphere transformed charcoal index"! (red curve) and fire activity 
estimates'* (model data give percentage of pre-industrial mean) caused by only 
natural wildfires (grey curve) and by both wildfires and anthropogenic fires 
(black curve). d, Northern Hemisphere rate of deforestation data from KK10" 
(dark blue), HYDE" (light blue solid line) and Pongratz'* (light blue dashed 
line). e, Northern Europe precipitation anomaly"». f, Northern Hemisphere 
temperature reconstruction’*® (pink) and Northern Hemisphere extratropics 
temperature (NEXT) reconstruction” (purple). Temperature difference AT is 
with respect to the 1961-90 temperature average. (1), (2) and (3) correspond to 
the three excursions in the Northern Hemisphere 5'°C record. 


anthropogenic deforestation also show a decrease around aD 200, 
which is related to drastic population declines in China and Europe 
following the fall of the Han dynasty and the decline of the Roman 
empire’’. Rapidly expanding industrial activity between 100 Bc and AD 
200 in both Europe and East Asia has been reconstructed from heavy- 
metal pollution detected in a Greenland ice core’* and sedimentary 
records’”. During that time, charcoal was the preferred fuel for 
industrial and domestic purposes, yielding a large source of 
13C-enriched CH, (ref. 20). Based on archaeological metal production 
estimates”’, we calculate that the charcoal used for metal production at 
the peak of the Roman empire alone could have produced 0.65 Tg yr | 
of CH4. Contemporary civilizations in China and India had similar 
populations and even more sophisticated metal industries”’. Although 
specific estimates of metal production are highly uncertain, we suggest 
that this early industrial activity may have contributed a sizeable frac- 
tion to the ~5Tgyr | extra pyrogenic CH, emissions before AD 200, 
according to our model calculation. 

The second §'°C excursion (Fig. 1) correlates very well in timing and 
duration with the temperature maximum of the Medieval Climate 
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Anomaly (MCA) that appears in both the Northern Hemisphere and 
NEXT temperature reconstructions (Fig. 2f) between AD 800 and AD 
1200. During this time, our reconstructed scenarios show a temporary 
decrease in biogenic sources and an increase in pyrogenic sources 
(Fig. 2). Widespread extended droughts’® associated with precipitation 
decrease in Northern Europe (Fig. 2e) provided favourable conditions 
for enhanced wildfire activity during the MCA. In addition, the 
medieval period was a time of accelerating deforestation (Fig. 2d) as 
a result of population expansion and urbanization in Europe and 
Asia, which coincides in time with the second 8'°C excursion 
observed. Fire activity data (Fig. 2c) provide independent support 
for both natural and anthropogenic contributions to the increased 
pyrogenic emissions during this period. 

Interestingly, during the MCA the isotopic composition of CH, 
implies generally decreasing biogenic emissions. This seems to con- 
tradict the idea that high latitude Northern Hemisphere wetland CH, 
emissions acted as a positive climate feedback to increasing Northern 
Hemisphere temperatures, which would have led to a decrease in 5'°C, 
contrary to our observations. Increases in methanogenesis rates caused 
by higher temperatures during this period may have been compensated 
by decreases in wetland area due to extended droughts”, leading to a 
net decrease in biogenic CH, emissions. 

The third maximum in 8'°C (Fig. 1) occurs simultaneously with the 
onset of the Little Ice Age (LIA). During this period, Northern 
Hemisphere temperatures and precipitation decreased (Fig. 2e, f). 
This may have led to unfavourable conditions for enhanced biogenic 
emissions. Increased fire activity and charcoal index values during the 
same period indicate that CH, from natural wildfires may have made a 
significant contribution to the observed 5'°C excursion. In addition, 
the rate of deforestation (Fig. 2d) shows an increase that is slightly 
delayed relative to the third 5'°C excursion, suggesting more pyrogenic 
emissions caused by rapid land clearance during this period. 

The long-term trend in mixing ratios and 5'°C values of CH, from 
biogenic sources between 100 Bc and AD 1600 (Fig. 1b) can have natural 
(primarily wetlands'***) and anthropogenic (primarily agricultural”) 
components. Recent model calculations* inferred an orbitally-controlled 
late Holocene increase in global CH, levels, primarily driven by increases 
of Southern Hemisphere tropical natural wetland emissions, which 
according to this theory are linked to variability in monsoon patterns 
during the Holocene’. On the other hand, we show in Fig. 3 that the 
long-term trend in CH, mixing ratio—that is, the increase between 


800 30 
—~ 750 25 E 
ne} ae 
a S 
& = 
2 2 
s 5 
= 700 20 9 
.— & 
= = 
E s 
st 4 
ae. 5 
© 650 15 8 
<¢ 

600 10 


Years aD 


Figure 3 | Comparison between estimate of area under land use and CH, 
mixing ratio. CH, mixing ratio records are from the Law Dome’ (red circles) 
and WAIS’ (orange circles) ice cores. The shaded area represents an 
uncertainty estimate on global area under land use (blue curve, from ref. 13). 
For details on methods used to construct the land use scenario, see 
Supplementary Information section 3. 
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100 Bc and aD 1800—is in very good agreement with reconstructed 
global anthropogenic land use’. This suggests that human activities, 
including the expansion of rice agriculture*’, played an important role 
in the observed long-term CH, trend over the past two millennia. 

Our new isotope data from air trapped in Greenland ice cores allow 
the reconstruction of variations in different CH, source categories over 
the past 2,100 years. The changes seen in our 8'°C ice core record 
cannot be explained without variability in biomass burning, which 
correlates qualitatively with the charcoal index and fire activity data. 
In addition, we show that the reconstructed source variations are 
correlated with anthropogenic activities, in particular with long-term 
increases in agricultural emissions and with varying levels of biomass 
burning during the period of the Roman empire and the Han dynasty, 
the MCA and the onset of the LIA. It is thus likely that human activity 
contributed to variations in CH, emissions to the atmosphere long 
before pre-industrial times. 


METHODS SUMMARY 


We analysed 47 ice samples from the NEEM ice core (North Greenland: site 
coordinates 77° 27’ N, 51° 4’ W) and 9 from the EUROCORE (Summit, Central 
Greenland: site coordinates: 72° 34’ N, 38° 27’ W) using a recently developed dry 
extraction procedure” followed by isotope ratio mass spectrometry. A layer of ice 
(3-5 mm) was microtomed from the surface of the ice to exclude contamination 
from the drilling liquid and possible anomalies caused by post-coring processes. 
Samples were not measured in chronological order. Contamination due to sample 
handling during coring and processing of the ice is very unlikely, because all 
samples were handled in the same manner and the observed 5'°C variations follow 
systematic patterns rather than showing an erratic behaviour. No similar temporal 
patterns are found in the recently obtained dust and ion records of the NEEM core 
(M. Bigler, personal communication), which makes in situ CH, formation in the 
ice also unlikely. Moreover, in situ production is very likely to be biogenic and 
would lead to increases of isotopically depleted CH, which are not observed in our 
profile. 

The precision of the 5'*C measurements, based on standards and replicate ice 
core measurements, is 0.12%o. All 8'°C values are reported versus VPDB (Vienna 
PeeDee Belemnite). We corrected our data for an inter-laboratory offset of 0.51%o0 
between our laboratory and Pennsylvania State University*”*, as determined in a 
recent intercalibration exercise using real ice core samples”. In addition, all isotope 
data were corrected for gravitational settling, and the four shallowest EUROCORE 
data were corrected for diffusive fractionation”. The age of each ice core air sample 
was calculated using a delta age (gas age minus ice age) of 183 years for NEEM” 
and 210 years for EUROCORE™. Modelled gas age distributions of NEEM and 
EUROCORE are presented in Supplementary Information. 
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A Silurian armoured aplacophoran and implications 


for molluscan phylogeny 


Mark D. Sutton', Derek E. G. Briggs’, David J. Siveter*, Derek J. Siveter* & Julia D. Sigwart® 


The Mollusca is one of the most diverse, important and well-studied 
invertebrate phyla; however, relationships among major molluscan 
taxa have long been a subject of controversy’”’. In particular, the 
position of the shell-less vermiform Aplacophora and its relation- 
ship to the better-known Polyplacophora (chitons) have been 
problematic: Aplacophora has been treated as a paraphyletic or 
monophyletic group at the base of the Mollusca*®*, proximate to 
other derived clades such as Cephalopoda”*"®, or as sister group to 
the Polyplacophora, forming the clade Aculifera’*”"”. Resolution 
of this debate is required to allow the evolutionary origins of 
Mollusca to be reconstructed with confidence. Recent fossil 
finds'*"'© support the Aculifera hypothesis, demonstrating that 
the Palaeozoic-era palaeoloricate ‘chitons’ included taxa combining 
certain polyplacophoran and aplacophoran characteristics’. 
However, fossils combining an unambiguously aplacophoran-like 
body with chiton-like valves have remained elusive. Here we 
describe such a fossil, Kulindroplax perissokomos gen. et sp. nov., 
from the Herefordshire Lagerstiatte’”"* (about 425 million years BP), 
a Silurian deposit preserving a marine biota’* in unusual 
three-dimensional detail. The specimen is reconstructed three- 
dimensionally through physical-optical tomography”. Phylogenetic 
analysis indicates that this and many other palaeoloricate chitons 
are crown-group aplacophorans. 


Phylum Mollusca 


Kulindroplax perissokomos Sutton, Briggs, Siveter, Siveter and 
Sigwart gen. et sp. nov. 


Etymology. kulindros (Greek; ‘roller’ or ‘cylinder’) plus plax (Greek; 
‘plate’), alluding to the cylindrical valve-bearing morphology; and 
perissokomos (Greek; ‘exceedingly hairy’), alluding to the pervasive 
spicule covering. Gender masculine. 

Holotype. Oxford University Museum of Natural History OUMNH 
C.29641, the only known specimen (Fig. la—m). 

Locality and horizon. Wenlock Series, Silurian; Herefordshire, 
England. 

Diagnosis for genus and species. Series of seven similar unarticulated 
valves, lacking ventral articulamentum. The head valve is shorter and 
much lower in height; the tail valve is taller. Valves with weak anterior 
jugal embayment, weakly obtuse apical angle and jugal angle near 90°. 
Posterolateral margins are weakly concave and serrated. All valves 
are mixoperipheral, with ventral apical area extending to over 25% 
of valve length. Valves lack ornament. The foot is absent and the girdle 
complete ventrally while open anteriorly and posteriorly; it bears 
concavo-convex blade-like spicules. A gill array of at least four pairs 
of elements is contained within the long posterior cavity. 
Description. The trunk is elongate, moderately curved ventrally 
(Fig. le) and bears seven morphologically similar valves (Fig. 1a, b, 
e) that overlap without articulating; these would have been in contact 


when the trunk was straight. In each valve, width > length > height 
(see Supplementary Note 1 for biometrics). Valves II-VI differ only 
slightly, but valve I is relatively short and low and valve VII (posterior 
terminal valve) is substantially taller (Fig. 1a, b, g, h). In transverse 
section the valves are keeled posteriorly and rounded anteriorly, with 
planar to weakly convex lateral areas and a jugal angle of approxi- 
mately 90° (Fig. 1g), except in valve I (approximately 120°). In lateral 
profile (Fig. le), the dorsal surface is weakly convex, the anterolateral 
margin is evenly curved and the posterolateral margin is straight to 
weakly concave. In dorsoventral views (Fig. la, b), the anterior margin 
shows a weak median jugal embayment. Lateral margins are straight 
to weakly convex, and anterolateral corners are well rounded. 
Posterolateral corners are sharp in posterior valves and more rounded 
in anterior valves, with angles increasing anteriorly from approxi- 
mately 80° (valve VII) to approximately 120° (valve I). Postero- 
lateral margins are weakly concave, converging to a rounded beak with 
a weakly obtuse apical angle. Weakly defined linear depressions on the 
lateral areas of valves converge posteriorly at the apex (for example, 
Fig. 1a, valve V). Posterolateral margins are weakly serrated (Fig. 1i). 
No growth lines or dorsal ornament have been observed. There is a 
mixoperipheral fold around the posterior margin of all valves on the 
ventral internal surface (apical area, Fig. 1b, k). Valves are up to 0.4mm 
thick away from apical areas. 

The girdle forms a cuticular cylinder 10.5 mm wide and 38 mm long 
(excluding valves and spicules), with open subplanar terminations 
(Fig. le, h); the anterior termination is flared ventrally. A foot is absent; 
a weak ventral ridge consisting of impersistently preserved thickening 
of ventromedial cuticle extends to both terminations (Fig. 1c, h, 1). The 
cuticle is attached to the ventral part of the valves near their lateral 
margins (Fig. 11) from anterior of apical area to near the posterolateral 
corners, except in the anteriormost valve in which attachment (pre- 
served on right only, Fig. 1h) does not extend anterior of mid-length. 
Where attached, the cuticle displays reflexion (laterally concave); 
between attachments it is subcircular in transverse section (Fig. Ic, ]; 
note that Fig. 1c is an oblique section). Densely packed spicules project 
subnormally from the entire girdle surface except where it attaches to 
valves, and along the ventral cuticular ridge (Fig. 1c-e, h, j,m). Spicules 
are blade-like with a sub-semicircular cross-section (Fig. 1d); they are 
convex anteriorly and weakly recurved to point posteriorly (Fig. 1j, m). 
Spicules are 1-2 mm long, with larger spicules occurring laterally. 
Spicules taper to pointed tips (Fig. 1m); some evenly, others maintain- 
ing maximum breadth for much of their length. 

The body mass is shrunken owing to decay and appears undifferenti- 
ated; valves III-VII preserve the attachment (Fig. 1b). Body mass 
approaches (but is not attached to) the ventral cuticle only near valve 
V; elsewhere it slopes dorsally. The surface of mass is uneven, asymmet- 
rical and marked by subcircular ‘pockmarks’ (for example, Fig. 1k). An 
anterior dorsomedian linear depression may represent the position of 
the gut (Fig. 1b). No radula is preserved. 
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Figure 1 | Oxford University Museum of Natural History 

(OUMNH) C.29641: holotype of Kulindroplax perissokomos. a, b and 
e-m are ‘virtual’ reconstructions. a, Dorsal stereo pair. b, Ventral stereo pair 
(cuticle and spicules removed). c, Photograph of the specimen before serial 
grinding (that is, section along primary split). Dashed box indicates the position 
of d. d, Detail of c showing spicules. e, Lateral stereo pair. f, Dorsal stereo pair of 
gill array (cuticle, spicules, valves and body removed). g, Posterior view (cuticle 
and spicules removed). h, Ventral view. i, Detail of e (rotated) showing serrated 
margin of valve IV. j, Representative piece of spicule-bearing cuticle, from point 
labelled ‘ss’ in e. Posterolateral view (ventral up). k, Ventrolateral view of 


A posterior cavity contains poorly preserved filament-bearing 
structures that are interpreted as elements of a gill array comprising 
four element pairs (Fig. 1b, f, g, k). These are described in detail in 
Supplementary Note 3. 


posterior of specimen (cuticle and spicules removed). 1, Oblique (subventral) 
view (spicules removed). m, Sub-posterior view (ventral upright) of cuticle 
piece from j with all but three spicules removed. Scale bars, 2 mm (virtual 
reconstructions are perspective views and scale decreases away from viewer; 
where depth of object is substantial, the scale is calculated for the valve closest to 
the viewer). aa, apical area; bm, body mass; cu, cuticle; el-e4, elements 1-4 of 
gill array; fs, fine-cut (~0.3 mm material removed); g, gut trace; pm, 
‘pockmark’; ps, primary split (crack between part and counterpart); sc, saw-cut 
(~2 mm material removed); sp., spicules; ss, spicule sample; v1-v7, valves I 
(head)-VII (tail); vr, ventral ridge. 


Discussion. The morphology of Kulindroplax provides new and 
clear documentation of the flexibility and disparity of the palaeoloricate 
bauplan*'*"'®, as well as supporting a natural clade Aculifera. Two other 
Palaeozoic fossils show both polyplacophoran- and aplacophoran-like 
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characteristics: Acaenoplax 
and the less well-preserved Ordovician Phthipodochiton thraivensis 
The arrangement and morphology of the valves of Acaenoplax differ 
from those in living polyplacophorans and their fossil representatives, 
and some features of the body have no direct equivalent in living apla- 
cophorans’*"*7°*!, The valves of Phthipodochiton in contrast, are typical 
of palaeoloricate “polyplacophorans’; this taxon also has a ventrally 
complete (or near-complete) girdle, although this feature is not suffi- 
ciently well preserved for satisfactory characterization. Kulindroplax 
represents the first unambiguous combination of palaeoloricate valves 
and aplacophoran body, and hence is the clearest known fossil link 
between the Polyplacophora and Aplacophora. The valves of 
Kulindroplax are closely comparable to those of the palaeoloricate 
Chelodes”, and of Phthipodochiton. However, the cuticle, sub-circular 
cross-section, complete coat of spicules and wide posterior cavity 
housing respiratory organs are aplacophoran features. Phthipodochiton 
has been compared with predatory solenogasters (neomeniomorphs)*””. 
Several characters of Kulindroplax, in contrast, invite comparison 
with the caudofoveate (chaetoderm) aplacophorans. These include 
the absence of a pedal groove or pit, the posteriorly (as opposed 
to posteroventrally) opening respiratory cavity, and the gill array 
with elements that are more closely comparable to the ctenidia of 
caudofoveates than the gill folds of the solenogasters, although there 
seem to be four pairs rather than one. 

All valve-bearing aculiferans possess eight dorsal valves where 
this number can be determined, except for Acaenoplax'** and 
Multiplacophora’’. Kulindroplax is unique in its chiton-like scleritome 
(that is, a single row of mixoperipherally grown overlapping dorsal 
valves) with only seven elements (see Supplementary Note 4, character 
14), although there is a parallel in the seven putative shell-fields 
described in one postlarval solenogaster”*. 

The phylogenetic position of Kulindroplax was investigated using a 
modified version of the data matrix described in ref. 5. The a priori 
weighting scheme of these authors is replaced here with a posteriori 
weighting schemes*>”®, although results under a priori weighting are 
similar. Supplementary Note 4 provides full details of the new matrix, 
the methods used and the results of different variants of our analysis. Our 
tree topology (Fig. 2) is similar to that depicted in ref. 5, recovering clades 
including the halwaxiids’” (Wiwaxia, Halkieria and Orthrozanclus), 
Neoloricata and Aculifera (Aplacophora and  Polyplacophora). 
Aculifera falls out as sister group to Conchifera, the latter represented 
by the monoplacophoran Neopilina, the early crown-group bivalves 
Pojetaia and Fordilla, and an Ordovician bivalve (Babinka) that had 
eightfold iteration of adductor muscles”. The recovery of Aculifera 
and Conchifera in all variants of our analysis demonstrates the strong 
morphological support that fossil data now offer for the Aculifera 
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Figure 2 | Summary of parsimony analysis results. Asterisk denotes extinct 
taxa. See Supplementary Note 4 for details and full version of tree. 
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hypothesis, paralleling recent molecular evidence favouring this topo- 
logy'*'"?, The Serialia hypothesis for a clade comprising monoplaco- 
phorans and polyplacophorans”® is rejected by present fossil data. 

Aplacophora (represented by Chaetoderma and Epimenia) is also 
recovered, but instead of occupying a derived position relative to 
related valve-bearing fossils, the extant aplacophoran genera occur 
relatively stemwards. Epimenia (a solenogaster) resolves as more 
primitive than Chaetoderma (a caudofoveate). This result is com- 
patible with neontological inferences, developed under the Aculifera 
hypothesis, that caudofoveates are relatively derived’. 

Evidence for the mode of life of Acaenoplax"* and Phthipodochiton” 
suggests a creeping epifaunal habit, but the concavo—convex spicules of 
Kulindroplax seem to be well adapted as sediment ratchets that could 
have facilitated forward movement through a substrate. Although 
these spicules occur all over the cuticle, the dorsal valves are not easy 
to reconcile with a fully infaunal (caudofoveate-like) autecology; 
Kulindroplax may instead have been semi-infaunal, probably faculta- 
tively, retaining a requirement for dorsal armour. 

Kulindroplax, Acaenoplax, Phthipodochiton and other palaeoloricates 
fall within the aplacophoran crown-group in the results from all but 
one of our analyses (the exception neither recovers nor excludes this 
position; see Supplementary Note 4). Thus, the presence of a foot 
cannot be inferred for any animal with chiton-like palaeoloricate 
valves. These topologies also indicate that valve loss occurred indepen- 
dently in Solenogastres and Caudofoveata: it is not a synapomorphy of 
Aplacophora. Aplacophorans were primitively shelled molluscs; the 
few living forms represent the survivors of a diverse valve-bearing 
Palaeozoic clade. 


METHODS SUMMARY 


Morphology was reconstructed digitally following serial grinding at 30-um 
intervals; compressed versions of the raw data are available for download as 
Supplementary Data 2. Some material has been lost to saw-cuts in processing; 
see Supplementary Note 2. The full reconstructed model is available for download 
and viewing in VAXML format” as Supplementary Data 1. See Supplementary 
Note 4 for details of the phylogenetic analyses. 
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A transcriptomic hourglass in plant embryogenesis 


Marcel Quint’, Hajk-Georg Drost*, Alexander Gabel’, Kristian Karsten Ullrich', Markus Bonn”? & Ivo Grosse 


Animal and plant development starts with a constituting phase 
called embryogenesis, which evolved independently in both 
lineages’. Comparative anatomy of vertebrate development—based 
on the Meckel-Serres law’ and von Baer’s laws of embryology’ from 
the early nineteenth century—shows that embryos from various 
taxa appear different in early stages, converge to a similar form 
during mid-embryogenesis, and again diverge in later stages. This 
morphogenetic series is known as the embryonic ‘hourglass’**, and 
its bottleneck of high conservation in mid-embryogenesis is 
referred to as the phylotypic stage®. Recent analyses in zebrafish 
and Drosophila embryos provided convincing molecular support 
for the hourglass model, because during the phylotypic stage the 
transcriptome was dominated by ancient genes’ and global gene 
expression profiles were reported to be most conserved’. 
Although extensively explored in animals, an embryonic hourglass 
has not been reported in plants, which represent the second major 
kingdom in the tree of life that evolved embryogenesis. Here we 
provide phylotranscriptomic evidence for a molecular embryonic 
hourglass in Arabidopsis thaliana, using two complementary 
approaches. This is particularly significant because the possible 
absence of an hourglass based on morphological features in plants 
suggests that morphological and molecular patterns might be 
uncoupled. Together with the reported developmental hourglass 
patterns in animals, these findings indicate convergent evolution 
of the molecular hourglass and a conserved logic of embryogenesis 
across kingdoms. 

In flowering plants, embryogenesis can be separated into three major 
phases. The early phase is characterized by asymmetric cell divisions to 
establish apical—basal polarity. In the intermediate phase, major organs 
and primordia are initiated, which expand in the late phase to the 
mature embryo””°. One notable difference between embryogenesis in 
animals and plants concerns the establishment of morphological vari- 
ation between taxa. For example, vertebrates develop morphological 
variation in late embryogenesis, whereas differences between flowering 
plant taxa are only established during post-embryonic development. 
Inspired by the historical relevance of the embryonic hourglass model 
in animals, by recent transcriptional support from studies in zebrafish’ 
and Drosophila’”*, and by the absence of any reported anatomical evid- 
ence for such a pattern during plant embryogenesis, we assess the 
possible existence of a transcriptional hourglass during embryogenesis 
of the plant reference species A. thaliana. 

Recently, genome-wide expression profiles of a complete develop- 
mental series from the zygote to the mature embryo in A. thaliana were 
obtained'’. To investigate the presence of an embryonic hourglass in 
plants, we combine this transcriptome information with two different 
measures of evolutionary distance: evolutionary age and sequence 
divergence. We compute two different transcriptome indices for each 
gene, the transcriptome age index (TAI)’ based on evolutionary age, 
and the transcriptome divergence index (TDI) based on sequence 
divergence. We investigate the profiles of these two transcriptome 
indices across the seven sampled embryo stages, and ask if and to what 
degree they show an hourglass pattern similar to that found for 
zebrafish’ or Drosophila’® 


For calculating the TAI, we assign an evolutionary age to each gene 
in the A. thaliana genome by sorting each gene into its phylostratum, 
defined as the most distant phylogenetic node containing at least one 
species with a detectable homologue (Methods, Supplementary Fig. 1, 
Supplementary Tables 1 and 2). The resulting phylostratigraphic 
map” contains 13 phylostrata, PS1-PS13 (Fig. 1a). PS1 includes the 
evolutionarily oldest genes with homologous sequences in prokaryotes, 
and PS13 includes the evolutionarily youngest genes with no homo- 
logue in any other species. 
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Figure 1 | Evolutionary age and sequence divergence of A. thaliana genes. 
a, Phylostratigraphic map of A. thaliana. Numbers in parenthesis denote the 
number of genes per phylostratum (PS1-PS13). Cell. org., cellular organisms 
described by PS1. b-e, Scatter plots of phylostratum versus K,/K, ratios over all 
genes. K,/K, ratios are derived from orthologous genes between A. thaliana and 
b, A. lyrata, ¢, T. halophila, d, C. rubella and e, B. rapa. Kendall t values denote 
the Kendall rank correlation coefficients measuring the association between 
both parameters. See Methods for details. 
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For calculating the TDI, we determine the sequence divergence 
between A. thaliana and its sister species Arabidopsis lyrata or any 
one of the closely related Brassicaceaes, Brassica rapa, Capsella rubella 
and Thellungiella halophila, by computing the K,/K, ratio (Supplemen- 
tary Table 3). Here K, is the number of non-synonymous substitutions 
per non-synonymous site and K, is the number of synonymous sub- 
stitutions per synonymous site for each orthologous gene pair. The K,/ 
K, ratio is an indicator of selective pressure within protein coding 
regions and, thus, reflects natural selection, one of the major forces 
driving molecular evolution. Interestingly, evolutionary age and 
sequence divergence as quantified above show only weak correlations 
(Kendall’s rank correlation coefficient ranging from 0.02 to 0.26; 
Fig. 1b-e), indicating that both measures of evolutionary distance 
can be regarded as complementary (Supplementary Note). 

In combination with transcript information, the TAI quantifies the 
mean evolutionary age of a transcriptome, where the evolutionary age 
(phylostratum) of each gene is weighted by its expression level’. 
Analogously, we define the TDI as the mean sequence divergence of 
a transcriptome, where the sequence divergence (K,/K,) of each gene is 
weighted by its expression level (Methods). 

Figure 2 and Supplementary Fig. 2 show the TAI and TDI profiles 
across the seven sampled embryo stages of A. thaliana. We find that 
transcriptomes of early plant embryonic stages such as zygote and 
quadrant are evolutionarily young (high TAI), transcriptomes of the 
mid-embryogenic phase ranging from the globular to the torpedo 
stage are older (low TAI), and transcriptomes of later stages of 
embryogenesis are younger again (Fig. 2a). Qualitatively, this TAI 
profile strikingly resembles the molecular hourglass pattern discovered 
for zebrafish and Drosophila’. Likewise, we find that transcriptomes of 
early stages are divergent (high TDI), transcriptomes of the mid- 
embryogenic phase are more conserved (low TDI), and transcriptomes 
of later stages of embryogenesis are more divergent again (Fig. 2b). 
Remarkably, the TDI profile qualitatively resembles the molecular 
hourglass pattern of the gene expression divergence profile discovered 
for Drosophila® and recently also Caenorhabditis”. 

Comparing both profiles, we make two observations. First, each of 
the profiles shows an hourglass pattern, where the TAI reflects long- 
term evolutionary changes covering 4 billion years since the origin of 
life, and the TDI reflects short-term evolutionary changes covering 
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Figure 2 | Transcriptome indices across A. thaliana embryogenesis. a, The 
transcriptome age index (TAI) profile. b, The transcriptome divergence index 
(TDD profile. Embryo stages: Z, zygote; Q, quadrant; G, globular; H, heart; T, 
torpedo; B, bent cotyledon; M, mature. Representative drawings (not on the 
same scale) are given for each sampled embryo stage. The blue shaded area 
marks the predicted phylotypic stage. The grey lines represent the standard 
error estimated by bootstrap analysis. The overall patterns of the TAI and TDI 
profiles are highly significant, as measured by permutation tests 

(Pray = 2.55 X 10-1; Ppp, = 2.15 X 10 °°). See Methods for details. 
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roughly 5-16 million years since the divergence of A. thaliana and 
the other four Brassicaceaes'*"” (Supplementary Note). Second, both 
profiles point to the torpedo stage as the predicted phylotypic stage, 
representing simultaneously the stage with the oldest as well as the 
most conserved/least divergent transcriptome. An independent, but 
comparable transcriptome dataset'*”’ from A. thaliana (Supplemen- 
tary Fig. 3), which likewise covers embryogenesis from early phases to 
the mature embryo, confirms the hourglass pattern for both indices 
(Supplementary Fig. 4). Together, these observations suggest the 
possibility of convergent evolution of a molecular embryonic 
hourglass in animals and A. thaliana, and make it tempting to conjec- 
ture its universal presence across animal and plant kingdoms. 

Given that developmental processes during plant and animal 
embryogenesis can be very different from the zygote stage on”, and 
that the embryonic hourglass must have evolved independently in 
plants and animals, we wish to understand how the torpedo stage as 
the bona fide phylotypic stage in A. thaliana relates to animal phylotypic 
stages. 

Across different animal taxa, the phylotypic stage was defined as the 
stage at which all major body parts are represented at their final 
positions as undifferentiated cell condensations”’. In relation to this 
ontogenic progression in animals, the mid-embryogenic transition 
from the globular to the heart stage may conceptually serve as the 
corresponding stage in flowering plants. Here, polar axes are established 
and shoot and root apical meristems are initiated**. Hence, the ensuing 
torpedo stage at the transition from mid- to late-embryogenesis 
marks an ontogenic progression that seems more advanced than the 
phylotypic stage known from animals. Considering that morpho- 
logical diversity and many important organs in flowering plants 
develop post-embryogenically, it is possible that the phylotypic stage 
may be shifted towards the transition from mid- to late-embryogenesis 
compared to animals. 

Furthermore, the torpedo stage roughly marks the transition from 
morphogenesis to the maturation phase. Morphogenesis involves the 
establishment of the embryo’s body plan, whereas maturation involves 
cell expansion and accumulation of storage macromolecules to prepare 
for desiccation, germination and early seedling growth”. Whereas all 
land plants/embryophytes (all species from PS4 on) including lower land 
plants pass through a morphogenesis phase, only the embryogenesis of 
higher land plants concludes with a maturation phase. Completely dif- 
ferent signalling cascades are involved in both phases. One set is switched 
off and the other one is initiated. Because torpedo stage embryos are in 
the transition between these different developmental programs, it is 
conceivable that transcriptional programs are likewise reduced to con- 
served and evolutionary ancient processes that are reflected by the neck 
of the hourglass (Fig. 2). 

Encouraged by these findings, we seek to understand how the 
molecular hourglass pattern of the TAI profile is determined. Two 
simple scenarios that would result in a decrease of TAI values include 
up-regulation of old genes, or down-regulation of young genes during 
mid-embryogenesis. To distinguish between both scenarios, we com- 
pute the relative expression levels of genes from phylostrata containing 
pre-embryogenesis species (PS1-PS3) versus post-embryogenesis 
phylostrata (land plants/embryophytes from PS4—PS13, representing 
plant species that pass through embryogenesis). Whereas expression 
levels of old genes vary only marginally across embryo stages, young 
genes are down-regulated towards the torpedo stage, and the ratio of 
the relative expression levels of old and young genes is maximized in 
the torpedo stage (Fig. 3a, Supplementary Fig. 5). Next, we divide the 
genes along the median of the K,/K, ratios over all genes and perform 
an analogous analysis for conserved (below median) versus divergent 
(above median) genes. Interestingly, we find a similar pattern, with 
divergent genes being more down-regulated towards the torpedo 
stage than conserved genes (Fig. 3b; Supplementary Figs 6-9). These 
results are confirmed by the independent dataset’*"? (Supplementary 
Figs 10-14). 
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Figure 3 | Relative expression levels over embryo stages. a, Left axis, mean 
relative expression levels of genes in PS1—PS3 (open bars) and PS4-PS13 
(shaded bars); relative expression levels range from 0 to 1. Right axis, ratio of 
mean relative expression levels between PS1-PS3 and PS4-PS13, data points 
connected by dashed line. b, Analogously to a, genes were divided along the 
median of the K,/K, ratios over all genes. Open and shaded bars show K,/K; 
values respectively below and above the median; data points connected by 
dashed line show the ratio of low to high K,/K, values. Error bars, standard 
error. Asterisks denote significant differences between PS1-PS3 and PS4—-PS13 
values (a) and conserved (below median) versus divergent (above median) 
genes at the torpedo stage (b); *P < 0.05; ***P < 0.0005. 


Hence, the embryonic hourglass in A. thaliana seems to be coordi- 
nated by the quantitative down-regulation of young/divergent genes 
or, qualitatively, by the expression of fewer young/divergent genes 
towards the torpedo stage. This is in notable agreement with observa- 
tions from the animal kingdom’***”. As only a fraction of these down- 
regulated young genes in A. thaliana display an hourglass shaped 
expression profile across the sampled embryo stages themselves, the 
hourglass pattern is most probably caused by different sets of young 
genes. One set is involved in morphogenesis (up-regulated before 
the torpedo stage and down-regulated thereafter) and one set is 
involved in maturation (down-regulated before the torpedo stage 
and up-regulated thereafter; Supplementary Figs 15 and 16). In addi- 
tion, we find that significantly enriched gene ontology terms among 
young plant genes down-regulated in the torpedo stage compared to 
early- or late embryogenesis describe signalling processes, such as 
responses to endogenous stimuli and hormones (Supplementary 
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Figure 4 | Convergent evolution of a molecular hourglass in animal and 
plant embryogenesis. Originating from a single-celled common ancestor, 
animal and plant lineages evolved both multicellularity and embryogenesis 
independently. For the coordinated progression of the organisms through 
embryogenesis, the transcriptomes have to follow an hourglass pattern with 
maximally ancient and conserved transcriptomes during the phylotypic stage. 


Tables 5 and 6). This indicates that signalling processes controlling 
transcription of relatively recently evolved genes are down-regulated 
during the predicted phylotypic stage of A. thaliana embryogenesis. 

Using a phylotranscriptomic approach based on two complementary 
measures of evolutionary distance and two independent datasets, we 
have observed a molecular embryonic hourglass in plants, which seems 
to be predominantly caused by down-regulation of young and divergent 
genes towards the torpedo stage (Fig. 4). This observation is surprising 
for two reasons. First, morphological diversity during embryogenesis of 
flowering plants is negligible, so the increase of both transcriptome 
indices in late embryogenesis precedes the morphological differences 
established only during post-embryonic development. Second, conver- 
gent evolution of a molecular hourglass pattern in animals and plants 
suggests operation of a fundamental developmental profile controlling 
the expression of evolutionarily young or rapidly evolving genes across 
kingdoms. We speculate that such a mechanism may be required for 
enabling spatio-temporal organization and differentiation of complex 
multicellular life. 


METHODS SUMMARY 


Following ref. 7, 1,459 species with completely sequenced genomes (Supplemen- 
tary Table 1) were sorted into 13 phylostrata (Supplementary Fig. 1), and amino 
acid sequences of A. thaliana were compared with the amino acid sequences of 
these species using Blast. If no Blast hit was identified, the corresponding gene of 
A. thaliana was assigned to PS13; otherwise, it was assigned to the phylogenetically 
most distant phylostratum containing at least one Blast hit. This procedure 
resulted in the phylostratigraphic map shown in Fig. la. 

Orthologous gene pairs of A. thaliana and A. lyrata, T. halophila, C. rubella, or 
B. rapa were determined with the method of best hits using Blastp. Amino acid 
sequence alignments of each pair generated with MAFFT” (L-INS-i option) were 
used for codon alignments generated with PAL2NAL” to compute sequence 
divergence levels (K,/K,) with GESTIMATOR*. Gene pairs with K,<0.5, 
K,<5 and K,/K, ratios < 2 were retained. 

Introduced in ref. 7, the TAI of developmental stage s is the weighted mean of 
the phylostratum ps; of gene i weighted by the expression level e;, of gene i at 
developmental stage s 


3. PSi€is 
TAI, = = (1) 


where n is the total number of genes analysed. Low PS values correspond to 
evolutionarily old genes, so low TAI values correspond to evolutionarily old tran- 
scriptomes. Likewise, high PS values correspond to evolutionarily young genes, so 
high TAI values correspond to evolutionarily young transcriptomes. 

Analogously, we introduce the TDI of developmental stage s by replacing ps; in 
equation (1) by the K,/K, ratio of gene i: 
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i Kai 
> (2) — 
i= (Ki 
ij) el (2) 


Here, low/high K,/K, ratios correspond to conserved/divergent genes, so low/high 
TDI values correspond to conserved/divergent transcriptomes. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Phylostratigraphic procedure. A full account of the procedure of constructing a 
phylostratigraphic map has been presented previously”’*. Adapted to A. thaliana, 
the following procedure was used in this study. The phylogeny of A. thaliana has 
been assigned according to the NCBI taxonomy database. For each of the 13 
phylostrata shown in Supplementary Fig. 1, all 6,617,032 amino acid sequences 
of all 1,459 species with completely sequenced genomes were extracted from 
Phytozome” and the databases listed in Supplementary Table 1. A database of 
these sequences was generated, and each of the 27,258 amino acid sequences of 
A. thaliana (TAIR9) with a minimum length of 30 amino acids was blasted against 
this database using blastp (BLAST version 2.2.21) with an E-value cut-off of 10°. 
Ifno blast hit was identified, the corresponding gene of A. thaliana was assigned to 
phylostratum 13 (PS13). Otherwise, the corresponding gene of A. thaliana was 
assigned to the phylogenetically most distant (oldest) phylostratum containing 
at least one species with at least one blast hit (Supplementary Table 2). This 
procedure resulted in the phylostratigraphic map shown in Fig. 1a. To investigate 
the dependence of the results on the blastp stringency, the entire procedure was 
repeated for varying E-value cut-offs of blastp ranging from 10° ' to 10°, result- 
ing in the TAI profiles shown in Supplementary Fig. 17. 

K,/K, ratios. Orthologous gene pairs of A. thaliana and A. lyrata or the closely 
related Brassicaceaes T. halophila, C. rubella or B. rapa were determined with the 
method of best hits using blastp. Amino acid sequence alignments of each pair 
generated with MAFFT’® (L-INS-i option) were used for codon alignments 
generated with PAL2NAL” to compute sequence divergence levels (K,/K,) with 
GESTIMATOR”. Gene pairs with K,<0.5, K,<5 and K,/K, ratios < 2 were 
retained (Supplementary Tables 3 and 4). 

TAI and TDI. Expression levels for the first dataset were extracted from ref. 11, 
and after outlier detection and ID mapping, 25,158 genes represented on the 
microarrays were included in subsequent analyses. The TAI and the TDI are 
weighted means of evolutionary age and sequence divergence, respectively, and 
defined as follows. Introduced in ref. 7, the transcriptome age index TAI, of 
developmental stage s (s = zygote, quadrant, globular, heart, torpedo, bent 
cotyledon, or mature) is the weighted mean of the evolutionary age (phylostratum) 
ps; of gene i weighted by the expression level e;, of gene i at developmental stage s, 


> psiéis 


i=1 


TAI, = = 
YS eis 
i=1 


where n is the total number of genes analysed. Low PS values correspond to 
evolutionarily old genes, so low TAI values correspond to evolutionarily old tran- 
scriptomes. Likewise, high PS values correspond to evolutionarily young genes, so 
high TAI values correspond to evolutionarily young transcriptomes. 

Analogously, we introduce the transcriptome divergence index TDI, of 
developmental stage s simply by replacing ps; in the above equation by the K,/ 
K, ratio of gene i, 


Hence, low/high K,/K, ratios correspond to conserved/divergent genes, so low/ 
high TDI values correspond to conserved/divergent transcriptomes. 

The same procedure was repeated for the second independent dataset covering 

the embryo propers of A. thaliana embryo stages pre-globular, globular, heart, 
linear cotyledon/torpedo, and mature’*"? (GEO accession number GSE12404). 
We normalized this dataset using the GCRMA package (version 2.0) from the 
Bioconductor project with default parameter settings*®. For each probe set we 
computed the stage-wise arithmetic mean of the replicates to get representative 
expression values for each stage. Here, 20,031 genes represented on the micro- 
arrays were included in the analyses. 
Statistical significance of TAI and TDI profiles. To determine the statistical 
significance of the TAI and TDI profiles, the following permutation test was 
performed. The variance Va; of the seven values of TAI, (for s = zygote, 
quadrant, ..., mature) was computed as test statistic. For determining the null 
distribution of Vy; all PS values within each developmental stage s were 
randomly permuted, seven surrogate values of TAI, were computed from this 
permuted dataset, and a surrogate value of V4; was computed from these seven 
surrogate values of TAI,. 


This procedure was repeated 1,000 times, yielding a histogram of 1,000 values of 
Van Which can be approximated by a gamma distribution. The two parameters of 
the gamma distribution were estimated by the method of moments, the fitted 
gamma distribution was considered the null distribution of Vya;, and the 
P-value of the observed value of V--,; was computed from this null distribution. 

The same procedure was repeated for the seven values of TDI,, yielding 

a P-value of the TDI profile. Likewise, the second dataset'*'? was analysed 
accordingly. 
Relative expression levels for phylostrata. Relative expression levels were com- 
puted as described previously’. In brief, the mean expression level ej, of phylos- 
tratum j and developmental stage s was computed for each j and sas the arithmetic 
mean of expression levels e;, of all genes i belonging to phylostratum j. The mean 
expression levels e;, were linearly transformed to the interval [0,1] according to 


a ejs mi €jmin 
fs - €jmax — Ejmin 
where €jmin/max is the minimum/maximum mean expression level of phylostratum 
j over the seven developmental stages s. This linear transformation corresponds to a 
shift by @min and a subsequent shrinkage by éjmax-@jmin- AS a result, the relative 
expression level f, of developmental stage s with minimum ¢;, is 0, the relative 
expression level fj, of the developmental stage s with maximum ¢;, is 1, and the 
relative expression levels f;, of all other stages s range between 0 and 1, accordingly. 

Next, relative expression levels were grouped into two PS classes, where the first 
PS class consists of relative expression levels of genes belonging to the three oldest 
phylostrata PS1-PS3, and where the second PS class consists of relative expression 
levels of genes belonging to the younger phylostrata PS4—PS13. This grouping was 
chosen to distinguish phylostrata of plants that pass through embryogenesis (PS4-— 
PS13) from the remaining phylostrata (PS1-PS3), in which the vast majority of 
species did not evolve embryogenesis. 

For each developmental stage s and each PS class, the mean value and standard 

error of the relative expression levels were computed. In addition, the ratio (fold- 
change) of the two relative expression levels was computed for each developmental 
stage s, and Welch’s two-sample t-test was performed. 
Relative expression levels for K,/K, quantiles. In contrast to PS values, which are 
discrete, K,/K, ratios are continuous. For computing relative expression levels of 
genes belonging to different K,/K, groups, continuous K,/K, ratios were grouped 
into deciles (10% quantiles). Relative expression levels of these ten K,/K, groups 
were computed in analogy to the computation of relative expression levels of the 
13 phylostrata. 

Likewise, relative expression levels were grouped into two K,/K; classes, where 
the first K,/K, class consists of relative expression levels of genes belonging to the 
first five K,/K, groups (K,/K, ratios below median, conserved genes), and where 
the second K,/K, class consists of relative expression levels of genes belonging to 
the remaining five K,/K, groups (K,/K, ratios above median, divergent genes). This 
grouping was chosen because the median is a natural choice, making both K,/K, 
classes equally large, and because the grouping of genes into different PS classes 
also resulted in two PS classes of roughly similar sizes (first dataset: 10,695 genes in 
PS1-PS3 and 14,463 genes in PS4—PS13; second dataset: 9,028 and 11,003 genes, 
respectively). 

The computation of mean values, standard errors, fold-changes and P-values of 
Welch’s two-sample t-test were performed as described in the previous section. 

To investigate the dependence of the results on the grouping into two K,/K, 

classes, the entire analysis was repeated for the following six pairs of K,/K, classes: 
two deciles/eight deciles, three deciles/seven deciles, ..., seven deciles/three deciles. 
The six resulting plots of means, standard errors, fold-changes and P-values are 
presented in Supplementary Figs 6-9, 11-14. 
Gene ontology analysis. Enrichment analysis of gene ontology terms was 
performed on genes from PS4-PS13 that were down-regulated at least two- 
fold in the torpedo stage compared to at least one of the developmental 
stages zygote/quadrant/globular/heart/bent cotyledon/mature (first dataset), or 
pre-globular/globular/heart/mature (second dataset). Gene ontology term enrich- 
ment was analysed using AmiGO*’ (filter options: TAIR and A. thaliana, 
P-value = 1 X 1077). 
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Pregnancy imprints regulatory memory that 
sustains anergy to fetal antigen 


Jared H. Rowe', James M. Ertelt'+, Lijun Xin'+ & Sing Sing Way't 


Pregnancy is an intricately orchestrated process where immune 
effector cells with fetal specificity are selectively silenced. This 
requires the sustained expansion of immune-suppressive maternal 
FOXP3* regulatory T cells (Tyeg Cells), because even transient partial 
ablation triggers fetal-specific effector T-cell activation and preg- 
nancy loss'”. In turn, many idiopathic pregnancy complications 
proposed to originate from disrupted fetal tolerance are associated 
with blunted maternal T,.. expansion*>. Importantly, however, the 
antigen specificity and cellular origin of maternal T,eg cells that 
accumulate during gestation remain incompletely defined. Here 
we show that pregnancy selectively stimulates the accumulation of 
maternal FOXP3* CD4 cells with fetal specificity using tetramer- 
based enrichment that allows the identification of rare endogenous 
T cells®. Interestingly, after delivery, fetal-specific T,.. cells persist at 
elevated levels, maintain tolerance to pre-existing fetal antigen, 
and rapidly re-accumulate during subsequent pregnancy. The 
accelerated expansion of T,., cells during secondary pregnancy 
was driven almost exclusively by proliferation of fetal-specific 
FOXP3* cells retained from prior pregnancy, whereas induced 
FOXP3 expression and proliferation of pre-existing FOXP3* cells 
each contribute to T,-, expansion during primary pregnancy. 
Furthermore, fetal resorption in secondary compared with primary 
pregnancy becomes more resilient to partial maternal FOXP3* cell 
ablation. Thus, pregnancy imprints FOXP3* CD4 cells that sustain 
protective regulatory memory to fetal antigen. We anticipate that 
these findings will spark further investigation on maternal regula- 
tory T-cell specificity that unlocks new strategies for improving 
pregnancy outcomes and novel approaches for therapeutically 
exploiting T,., cell memory. 

The accumulation of maternal T,.,¢ cells during pregnancy 
parallels the need for expanded tolerance to encompass ‘non-self fetal 
antigens**7*, However, one consequence of sustained FOXP3* cell 
expansion is susceptibility to prenatal infection’. Given the increas- 
ingly recognized importance of Ty eg specificity in regulating the fluid 
balance between immune activation that maintains host defence and 
immune suppression that prevents autoimmunity”“*, we reasoned 
that establishing the specificity of maternal Tyg cells that expand during 
pregnancy could unravel ways to dissociate their beneficial and detri- 
mental impacts. Furthermore, extending this analysis post-partum may 
allow the regulatory memory recently described for T;eg cells responsive 
to an induced self antigen to be investigated in a more physiological 
context'®. To address these questions, we developed a mating strategy 
where the I-A? 2W 1Ss5_6g peptide (a variant of peptide residues 55-68 
for the alpha chain of the mouse major histocompatibility complex 
(MHC) class II, I-E*) becomes a surrogate fetal antigen using male 
mice (H-2°; Balb/c or H-2° C57BL/6 [B6]) engineered to co-express 
this peptide with f-actin to impregnate non-2W1S-expressing B6 
females’®. In turn, the high precursor frequency of CD4 cells with 
2W15Ss5-63 specificity allows endogenous maternal T,., cells to this 
surrogate fetal antigen to be identified using MHC class II tetramer 
enrichment’. 


Using this approach, maternal CD4 cells with fetal-2W 1S specificity 
were found to sharply upregulate CD44 expression, progressively 
accumulate throughout pregnancy, and persist at approximately ten- 
fold increased levels through day 100 post-partum compared with 
non-pregnant controls (Fig. 1a). Maternal 2W1S* cell expansion 
was specific to mating with 2W1S-expressing mice because they did 
not accumulate in females impregnated by non-transgenic Balb/c 
males (Supplementary Fig. 1). Because seminal fluid also contains cells 
of paternal origin’’, 2W1S" cells in female mice rendered infertile with 
low-dose irradiation were also enumerated. We found that although 
mating without pregnancy stimulated modest 2W1S* cell expansion 
and CD44 upregulation, the magnitude was markedly reduced compared 
with pregnant mice (Supplementary Fig. 1). Thus, maternal 2W1S* CD4 
cell expansion during pregnancy reflects an antigen-specific response 
to cells of fetal origin. 

Given the essential requirement for Teg cells in maintaining fetal 
tolerance*”'*"”, we investigated FOXP3 expression among maternal cells 
with fetal-2W1S specificity. Beginning mid-gestation, 2W1S* com- 
pared with 2W1S” CD4 cells became enriched for FOXP3 expression 
in allogeneic (Fig. 1a, b), as well as syngeneic pregnancy (Supplemen- 
tary Fig. 2). As pregnancy progressed, FOXP3 expression among 
2WI15S" cells became progressively more pronounced, peaking at around 
50% late gestation through to the first 48 h post-partum (embryonic day 
18.5 (E18.5) to post-partum day 2 (PP2)) (Fig. 1a, b and Supplementary 
Fig. 3). Furthermore, 2W1S* FOXP3* cells, and to a lesser extent 
2W1S*FOXP3° cells, upregulated the proliferation marker Ki67 that 
paralleled expanding fetal tissue (Fig. 1c). Reciprocally after expulsion 
of the fetus (PP14 to PP100), Ki67 expression among 2W1S* FOXP3* 
and 2W1S*FOXP3°> cells became reduced (Fig. 1c). However, despite 
diminished Ki67 levels, FOXP3 expression among 2W1S* cells 
was sustained at ~20% through day 100 post-partum (Fig. 1a, b). 
Accordingly, maternal T,.g cells with fetal specificity selectively accu- 
mulate during pregnancy and persist following parturition. 

Interestingly, maternal T,eg cells with fetal-2W1S specificity also 
progressively downregulated Helios (also known as s IKZF2) expression 
that dropped to its lowest level of ~40% Helios™ by late gestation, 
whereas the few 2W1S* FOXP3* cells in non-pregnant mice were uni- 
formly Helios” (Fig. 1d). Comparatively, Helios expression among bulk 
maternal T,.g cells did not shift significantly. Although this discordance 
in Helios expression may suggest conversion of fetal-specific FOXP3 ~ 
cells into FOXP3" cells”, the recent finding that some peripherally 
induced T;<g cells also express Helios led us to more definitively investi- 
gate the origin of maternal T,.g cells with fetal specificity”. In particular, 
we asked whether mating with 2W1S-expressing males can convert 
2W1S* FOXP3” CD4 cells from Foxp3?™P?® donors ablated of Treg 
cells with diphtheria toxin? (Foxp3?'*/?T® Treg cells carry the human 
diphtheria toxin receptor (DTR) fused to an internal ribosome entry 
site into the 3’ untranslated region of Foxp3, rendering them susceptible 
to ablation with low-dose diphtheria toxin) into FOXP3* cells after 
adoptive transfer into virgin Foxp3"/“" recipient mice. By mid- 
gestation, 2W1S*FOXP3" among Tyeg-ablated donor fogs 


1University of Minnesota School of Medicine, Departments of Pediatrics and Microbiology, Center for Infectious Disease and Microbiology Translational Research, Center for Immunology, Minneapolis, 
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Figure 1 | Accumulation of maternal CD4 and FOXP3* T,.g cells with fetal 


Treg 
specificity during gestation. a, Total 2W1S* or 2W1S* FOXP3* CD4 cells in 


B6 females impregnated by Balb/c-2W1S males. b, Percentage FOXP3* 
among 2W1S" or 2W1S CD4 cells. c, Percentage Ki67* among 
2W1S' FOXP3* or 2W1S*FOXP3~ CD64 cells. d, Percentage Helios” among 


cells were readily recovered, illustrating induction of maternal T,., cells 
with fetal specificity (Fig. le). This conversion was pregnancy-specific 
and not due to incomplete donor T,.g ablation because FOXP3° cells 
were undetectable among T;,g-ablated donor cells in unmated control 
mice (Supplementary Fig. 4). Importantly, however, FOXP3* among 

Tyeg-ablated donor CD4 cells was also consistently reduced (by ~50%) 
compared with either 2W1S*FOXP3* donor cells in mice without 
diphtheria toxin treatment or among recipient CD4 cells not susceptible 
to diphtheria toxin (Fig. le). Thus, FOXP3 induction among FOXP3 ~ 
precursors and proliferation of pre-existing FOXP3° cells each con- 
tribute to the accumulation of maternal T;<g cells with fetal specificity 
during primary pregnancy. 

To further characterize maternal Tg cells with specificity to pre- 
existing fetal antigen that persist post-partum, these cells were tracked 
during subsequent pregnancy. After secondary mating, maternal 
FOXP3* cells with fetal-2W1S specificity accumulated with accelerated 
kinetics in an antigen-specific fashion (Fig. 2a and Supplementary 
Fig. 5). The more rapid expansion of maternal T;eg cells in separate 
groups of mice was recapitulated within the same mouse by measuring 
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2WI1S*FOXP3* or 2W1S FOXP3* CD4 cells. e, Percentage FOXP3* among 
Foxp3PTPTR donor (CD45.1*) or Foxp3 7 recipient (CD45.2*) 2W1S* 
CD4 cells mid-gestation (E11.5) by Balb/c-2W1S males, with diphtheria toxin 
(DT) treatment (top) or no diphtheria toxin controls (bottom). Bars, 

means + one standard error. 


2wWis* Treg accumulation among donor CD4 cells from post-partum 
mice (secondary expansion) adoptively transferred before mating with 
2W1S-expressing males, compared with cells in virgin recipient mice 
(primary expansion) (Fig. 2b). By substituting CD4 cells from post- 
partum Foxp3?'PT® mice for adoptive transfer and using diphtheria 
toxin to eliminate donor T,eg cells before mating, we also addressed 
whether the accelerated secondary expansion of maternal T,g cells 
with fetal-2W1S specificity reflects more vigorous induction among 
FOXP3~ cells or proliferation of pre-existing FOXP3° cells. We found 
that, in sharp contrast to primary Pregnancy, the ablation of donor 
Treg Cells from post-partum Foxp3?'PTR mice almost uniformly 
eliminated their expansion in subsequent pregnancy (Fig. 2c). Thus, 
recurrent pregnancy primes the accelerated accumulation of maternal 
FOXP3* cells that expand from pre-existing Treg cells retained from 
prior pregnancy. 

Expanding this model, the responsiveness of maternal CD4 cells 
with fetal specificity was also investigated. We found 2W1S" cells 
recovered from mice mid-gestation or post-partum each compared 
with non-pregnant controls did not produce appreciable IFN-y ex vivo 
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Figure 2 | Accelerated expansion of maternal T,., cells with fetal specificity 
during secondary pregnancy. a, Percentage FOXP3* among virgin (primary 
pregnancy) or post-partum (secondary pregnancy) females before mating or 

mid-gestation (E11.5) by Balb/c-2W1S males. b, Percentage FOXP3* among 


following stimulation, consistent with previously described anergy of 
maternal cells with fetal specificity (Supplementary Fig. 6)*”. Therefore, 
to more fully evaluate the responsiveness of maternal T cells with fetal- 
2WIS specificity, we measured their in vivo response to Listeria mono- 
cytogenes engineered to express the 2W1Ss;5_63 peptide (Lm-2W1S) 
that potently stimulates T};1-differentiation in other contexts***. We 
found that 2W1S* cells expand and upregulate T-bet (also known as 
TBX21) expression each in an antigen-specific fashion in both naive 
mice and mice impregnated by 2W1S-expressing males after Lm-2W1S 
inoculation, similar to other intracellular pathogens (Supplementary 
Fig. 7)°°. Interestingly, however, 2W1S* cells in pregnant mice where 
2W1S represents a surrogate fetal antigen produced only background 
levels of IFN-y and other effector cytokines, with reciprocal accumula- 
tion of FOXP3 * cells (Fig. 3a and Supplementary Fig. 8). Comparatively, 
>15% of 2W1S* cells in Lm-2W1S-inoculated virgin mice were IFN-y* 
(Fig. 3a). This hypo-responsiveness was specific to fetal-2W1S stimu- 
lation, because 2W1S* CD4 cells in mice impregnated with non-2W1S- 
expressing males produced IFN-y levels comparable to non-pregnant 
controls (Fig. 3a and Supplementary Fig. 8). Given the sustained enrich- 
ment of fetal-specific T,eg cells after delivery (Fig. 1a, b), these studies 
were extended to investigate whether diminished IFN-y production 
among maternal CD4 cells with specificity to pre-existing fetal antigen 
is similarly maintained. Remarkably, IFN-y production remained 
anaemic in post-partum mice previously exposed to 2W1S as a fetal 
antigen, whereas post-partum mice without prior fetal-2W1S exposure 
produce IFN-y comparable to non-pregnant controls (Fig. 3b). 
Accordingly, pregnancy imprints functional anergy for maternal CD4 
cells with fetal specificity that is sustained post-partum. 
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Helios 


post-partum donor (CD45.1*) or naive recipient (CD45.2") 2W1S* CD4 cells 
mid-gestation by Balb/c-2WIS males. c, Percentage FOXP3* among Tyeg- 
ablated Foxp3?'/?™ post-partum donor (CD45.1*) or naive recipient 
(CD45.2*)2W1S* CD4 cells mid-gestation. Bars, means + one standard error. 


To dissociate whether pregnancy-induced T-cell anergy was cell- 
intrinsic or imposed by features associated with the post-partum envir- 
onment, we measured IFN-y production by CD4 cells from post-partum 
or virgin mice after adoptive transfer into naive recipient mice. We found 
IFN-y production by donor post-partum and each group of naive (donor 
and recipient) 2W1S* CD4 cells were similar, and notably increased 
compared with 2W1S* cells in un-manipulated post-partum mice fol- 
lowing Lm-2W1S inoculation (Fig. 3b, c). Thus, anergy among maternal 
CD4 cells with specificity to pre-existing fetal antigen is not cell-intrinsic, 
but maintained by the post-partum environment. 

In complementary studies we addressed the importance of maternal 
Treg Cells in sustaining anergy to cells with specificity to pre-existing 
fetal antigen by investigating the effect of replacing the entire Tyg 
compartment in post-partum mice previously exposed to fetal-2W1S 
with naive FOXP3* cells from virgin mice. Consistent with recent 
studies using adoptively transferred Foxp3"/"" CD4 cells to refill 
the cellular compartment in Foxp3?'®/?™ mice sustained on diphtheria 
toxin treatment’, Treg cells from naive mice efficiently reconstituted Teg 
ablated Foxp3?'P?® post-partum mice (Fig. 4a). Using this approach, 
we found that replacing maternal FOXP3" cells in post-partum mice 
with Tyg cells from naive mice restored IFN-y production for 2W1S* 
CD4 cells (Fig. 4b). Furthermore, whereas only rare 2W1S* FOXP3* 
cells that were Helios™ were found among post-partum mice recons- 
tituted with naive T,.g cells, a significant proportion of 2WIS* cells 
expanded in response to Lm-2W1S in intact post-partum mice 
remained FOXP3* (~20%) and Helios (~40%) (Fig. 4b). Thus, the 
muted expansion of naive FOXP3* CD4 cells with L. monocytogenes 
infection is overcome by pregnancy-induced T,.g activation (Fig. 4b and 
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Figure 3 | The post-partum environment maintains anergy for maternal 
CD4 cells with pre-existing fetal specificity. a, IFN-y-producing 2W1S* CD4 
cells 5 days after Lm-2W1S inoculation in virgin or pregnant mice mid- 
gestation by Balb/c-2W 1S or Balb/c males. b, IFN-y-producing 2W1S* CD4 
cells 5 days after Lm-2W1S inoculation in post-partum mice previously 
impregnated by Balb/c-2W1S or Balb/c males. c, IFN-y-producing post- 
partum donor (CD45.2* CD90.2*), naive donor (CD45.2*CD90.1~) or naive 
recipient (CD45.1~) CD4 cells 5 days after Lm-2W1S inoculation and 
stimulation with phorbol myristate acetate/ionomycin. Bars, means + one 
standard error. 


Supplementary Fig. 8)**. By extension, the restored responsiveness of 
post-partum 2W1S* cells after adoptive transfer into Treg-Sufficient 
naive mice most likely represents dilution of co-transferred maternal 
Treg Cells with fetal-2W1S specificity (Fig. 3c). 

Lastly, to establish how maternal T,., cells with fetal specificity 
retained post-partum have an effect on subsequent pregnancy out- 
comes, the frequency of fetal resorption triggered by partial maternal 
FOXP3* cell ablation using Foxp3” "7 mice was compared between 
secondary and primary pregnancy’. We found secondary pregnancy 
became significantly more resilient to partial T,., ablation because fetal 
resorption was reduced by ~60% compared with primary pregnancy 
(Fig. 4c). In turn, fetal resorption in T,..-sufficient Foxp3 TW mice 
during secondary pregnancy was also significantly reduced from back- 
ground levels compared with primary allogeneic pregnancy. Maternal 
Treg Cells were essential for these protective effects because wholesale 
FOXP3* cell ablation using Foxp3?'/P™® mice triggered pervasive 
fetal resorption equally in secondary and primary allogeneic pregnancy 
(Fig. 4c). Importantly, fetal wastage with maternal T,.. ablation in this 
context was driven by antigen heterogeneity, and not poor maternal 
health, because the frequency of resorption was sharply reduced with 
Tyeg ablation in mice bearing syngeneic pregnancy (Supplementary 
Fig. 9). 

Together, these findings establish a model whereby pregnancy primes 
the selective accumulation and activation of maternal T;.g cells with fetal 
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Figure 4 | Maternal post-partum Treg cells mitigate IFN-y responsiveness 
and mediate resiliency to fetal resorption in secondary pregnancy. 

a, Representative plots illustrating the majority (>96%) of FOXP3” cells are 
derived from adoptively transferred CD4 in diphtheria-toxin-treated 
Foxp3?™®/PT® mice. b, IFN-y-producing 2W1S* cells among 
CD45.1°CD45.2> cells, accumulation of 2W1S*FOXP3* cells, and Helios 
expression among 2W1S*FOXP3" Teg cells 5 days after Lm-2W1S 
inoculation. c, Percentage fetal resorption during primary (open) or secondary 
(shaded) allogeneic pregnancy for Foxp3 “7, Foxp3?'“" or Foxp3? 16? 1® 
females 5 days after diphtheria toxin initiation beginning mid-gestation. Bars, 
means + one standard error. 


specificity (Supplementary Fig. 10), and extend the role of antigen- 
experienced T,,g cells from primary into subsequent pregnancies””"*. 
In this regard, whereas maternal T;.g cells have been described to expand 
up to twofold when examined in a non-antigen-specific fashion*~>, our 
results demonstrate that FOXP3" cells with fetal specificity expand 
>100-fold through parturition (Fig. la and Supplementary Fig. 3). 
After delivery, maternal T,.g cells with fetal specificity are sustained 
at enriched levels, and are functionally distinct as they re-accumulate 
with accelerated kinetics and out-compete ‘naive’ T,e, cells during 
secondary pregnancy. Similar to discordant functional properties of 
naive and activated effector T cells”’, these results uncover the exciting 
possibility of exploiting antigen-specific ‘memory’ T;eg cells to disso- 
ciate detrimental and beneficial immune responses. Applied to human 
pregnancy, these data may explain why rates of pre-eclampsia, and other 
complications associated with disrupted fetal tolerance, are reduced in 
secondary compared with primary pregnancy*®. However, given the 
increased risk of pre-eclampsia in recurrent human pregnancy when 


the inter-pregnancy interval is extended, waning T,eg memory similar 
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to other CD4 subsets would not be unexpected**”. Therefore, estab- 
lishing the durability of pregnancy-induced regulatory memory and 
characterizing whether this response can be sustained with boosting 
represent next steps with exceptional scientific importance. 


METHODS SUMMARY 

Mice. 2W1S-expressing and Foxp3°™® mice have each been described'*”*. Mated 
females and the timing of pregnancy was determined by visualization of a cop- 
ulation plug (E0.5). For infection, Lm-2W1S (10* colony-forming units, see 
Methods)” was inoculated intravenously. Experiments were performed in accord- 
ance with University of Minnesota IACUC approved protocols. 

Tetramer enrichment and flow cytometry. L-A> 2WI1Ss5 6g tetramer staining 
and enrichment have been described®. Lymphoid cells from the spleen and lymph 
nodes were enriched with 2W1S;5_6g tetramer before surface, intracellular or intra- 
nuclear staining. For stimulation, cells were cultured with phorbol myristate acetate 
plus ionomycin (5h) before staining. 

Cell transfer and ablation. One mouse equivalent of purified CD4 cells was intra- 
venously transferred into recipient mice 1 day before mating or infection. Donor 
Treg Cells from Foxp3?® mice were ablated in recipient Foxp3’" mice using purified 
diphtheria toxin (two doses 8 h apart, 0.5 tg per dose). For ablation of endogenous 
Treg cells, diphtheria toxin was administered daily (0.5 pug first dose, followed by 
0.1 pg per dose daily thereafter) beginning mid-gestation or immediately following 
donor-cell transfer. 

Statistics. Data were analysed using the unpaired (separate groups of mice) or 
paired (cell subsets within the same mouse) Student’s ¢ test. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Mice. C57BL/6 (B6; H-2”””) or CD45.1 (H-2””) and CD90.1 (H-2””) mice on the 
B6 background, and Balb/c (H-2°*) mice, were purchased from The Jackson 
Laboratory or The National Cancer Institute. Transgenic mice expressing 
2W1Ss5_¢g antigen behind the f-actin promoter in all cells, and Foxp3?TRDTR 
mice where T;eg cells become susceptible to ablation with low-dose diphtheria 
toxin have each been described'*”*. Foxp3?™P?® mice on the B6 background 
were intercrossed with CD45.1 mice to generate cD45.17/* Foxp3?™/PT mice. 
For mating, 2W1S-expressing males were used either on the B6 background, or 
backcrossed five generations to Balb/c mice, and verified to be H-2". Males were 
introduced to either virgin or post-partum (>14 days) females for 24h, and mated 
mice visualized by a copulation plug representing E0.5. In each experiment, pups 
were removed within 24h after delivery to prevent the potential transfer of fetal 
antigen through suckling. For sterilization, female mice were sub-lethally irradiated 
(100 rads) before mating. 

Tetramer enrichment and flow cytometry. Phycoerythrin or allophycocyanin- 
conjugated MHC class II Tea? 2W1Sss-6g tetramer, and their use with anti- 
fluorophore-conjugated magnetic beads (Miltenyi Biotec) for enrichment have been 
described**’. For enumerating total 2WI1S* cells, all nucleated cells from secondary 
lymphoid tissue (spleen and axillary, brachial, cervical, inguinal, mesenteric, pancrea- 
tic, para-aortic/uterine lymph nodes) were collected, enriched using I-A? 2WI1Ss5_68 
tetramer, and stained for cell-surface CD4, CD8a, CD25, CD45.1, CD45.2, CD90.1, 
CD90.2, CD44, CD11b, CD11c, B220, F4/80, intracellular IFN-y, IL-4, IL-10, IL-17A, 
or intranuclear FOXP3, Helios, Ki67, or T-bet expression using commercially avail- 
able antibodies and cell permeabilization reagents (BD PharMingen or eBioscience). 
For stimulation, cells were cultured with phorbol myristate acetate/ionomycin for 5h 
in media supplemented with brefeldin A before tetramer staining. 

Cell transfer and ablation. For adoptive transfer, CD4 cells in the spleen and 
lymph nodes were first purified by negative selection (Miltenyi Biotech), and one 
mouse equivalent of donor cells intravenously transferred into recipient mice 
1 day before mating : nd/or infection. For ablation of donor Foxp3?™/P"® cells, 
recipient Foxp3'/“ mice were treated with purified diphtheria toxin (Sigma 
Chemicals, two doses 8 h apart, 0.5 yg per dose). For ablation of endogenous Treg 
cells in Foxp3?*™? or Foxp3? 18? TS mice during pregnancy, purified diphtheria 
toxin was administered daily (0.5 1g first dose, followed by 0.1 pg per dose there- 
after beginning mid-gestation) for 4 consecutive days. For FOXP3* cell recon- 
stitution requiring sustained ablation of endogenous Tyg cells in Foxp3? ete oe 
recipient mice, purified diphtheria toxin was administered daily (0.5 1g first dose, 
followed by 0.1 pg per dose thereafter) immediately following transfer of purified 
donor CD4 cells as described’. 
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Bacteria. Listeria monocytogenes was engineered to stably express and secrete 
2W1Ss5-6g antigen by sub-cloning the hly promoter, signal sequence, 2W1Ss5_6g 
and ovalbumin coding sequence from the pAM401-based expression construct***! 
into the temperature sensitive plasmid pKSV7**. This pKSV7-based plasmid con- 
taining the hly promoter, signal sequence, 2W1S;55 63 and ovalbumin coding 
sequence was used to electroporate Lm-OVA”, and transformants selected by 
resistance to chloramphenicol (10 jg ml’ final concentration) at room temper- 
ature. Individual clones were then passaged five times at 40°C in brain heart 
infusion media (Becton Dickinson) with chloramphenicol selection (plasmid 
integration into the L. monocytogenes genome), followed by passage without 
antibiotic selection (plasmid excision from the L. monocytogenes genome). L. 
monocytogenes clones where double-homologous recombination had occurred 
were initially screened by replica plating for sensitivity to chloramphenicol, and 
a single clone (Lm-2W 15S) verified by PCR and DNA sequencing using previously 
described methods”. For infection, Lm-2W1S was grown to early log phase 
(OD600 nm 9-1) in brain heart infusion media, washed and diluted with sterile 
saline, and then inoculated intravenously in the lateral tail vein (10* colony- 
forming units per mouse). Five days thereafter, the antigen-specific CD4 cell 
response in the spleen and lymph nodes was investigated with I-A° 2W1Ss5-«8 
tetramer staining. 

Data acquisition and analysis. Cells stained with fluorochrome-conjugated tetra- 
mer and antibody were acquired on a FACSCanto cytometer (Becton Dickinson), 
and analysed using FlowJo (TreeStar) software. The number and percent cells were 
then analysed and found to be normally distributed, and thereafter the difference 
between separate groups of mice were analysed using an unpaired Student’s t test, 
whereas differences between individual cell subsets within the same mouse were 
analysed using the paired Student’s t test (Prism, GraphPad). For all analysis, 
P<0.05 was taken as statistically significant. 
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Rapid induction of inflammatory lipid mediators by 


the inflammasome in vivo 
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Detection of microbial products by host inflammasomes is an import- 
ant mechanism of innate immune surveillance. Inflammasomes 
activate the caspase-1 (CASP1) protease, which processes the cyto- 
kines interleukin (IL)-1B and IL-18, and initiates a lytic host cell 
death called pyroptosis’. To identify novel CASP1 functions in vivo, 
we devised a strategy for cytosolic delivery of bacterial flagellin, a 
specific ligand for the NAIP5 (NLR family, apoptosis inhibitory 
protein 5)/NLRC4 (NLR family, CARD-domain-containing 4) 
inflammasome” ~*. Here we show that systemic inflammasome activa- 
tion by flagellin leads to a loss of vascular fluid into the intestine and 
peritoneal cavity, resulting in rapid (less than 30 min) death in mice. 
This unexpected response depends on the inflammasome compo- 
nents NAIP5, NLRC4 and CASP1, but is independent of the produc- 
tion of IL-1 or IL-18. Instead, inflammasome activation results, 
within minutes, in an ‘eicosanoid storm’—a pathological release of 
signalling lipids, including prostaglandins and leukotrienes, that 
rapidly initiate inflammation and vascular fluid loss. Mice deficient 
in cyclooxygenase-1, a critical enzyme in prostaglandin biosynthesis, 
are resistant to these rapid pathological effects of systemic inflam- 
masome activation by either flagellin or anthrax lethal toxin. 
Inflammasome-dependent biosynthesis of eicosanoids is mediated 
by the activation of cytosolic phospholipase A, in resident peritoneal 
macrophages, which are specifically primed for the production of 
eicosanoids by high expression of eicosanoid biosynthetic enzymes. 
Our results therefore identify eicosanoids as a previously unrecog- 
nized cell-type-specific signalling output of the inflammasome with 
marked physiological consequences in vivo. 

NAIP/NLRC4 inflammasome activation is critical for innate 
immune detection and defence against multiple bacterial pathogens 
in mice>~’. This resistance to infection, as well as the inflammasome- 
dependent response to systemic endotoxin, does not require IL-1B or 
IL-18 (refs 5, 8, 9), suggesting a critical role for pyroptosis and/or other 
inflammasome functions””’. In this study we sought to identify previ- 
ously unknown in vivo signalling outputs of the NAIP5/NLRC4 
inflammasome. To activate NAIP5/NLRC4 selectively, we delivered 
Legionella pneumophila flagellin (FlaA) to the cytosol by the fusion of 
FlaA to the amino-terminal domain of Bacillus anthracis lethal factor 
(LFn), which mediates cytosolic delivery through the anthrax protective 
antigen (PA) channel”. As expected, PA plus LFn-FlaA (here called 
FlaTox), but not PA or LFn-FlaA alone, activated the NAIP5/NLRC4 
inflammasome in bone-marrow-derived macrophages (BMDMs), as 
indicated by cleavage of CASP1, release of lactate dehydrogenase and 
secretion of IL-1B (Supplementary Fig. 1). Inflammasome activation 
was abrogated in Caspl”-', Naip5 ‘~ and Nircd’‘~ BMDMs 
(Supplementary Fig. 1b). A mutant form of FlaTox (FlaTox(AAA)), 
which is recognized by Toll-like receptor (TLR)-5 but not by NAIP5 
(ref. 4), did not activate pyroptosis (Supplementary Fig. 1a). 

We then determined the effect of FlaTox administration in vivo. 
Intravenous or intraperitoneal delivery of FlaTox killed mice rapidly, 


causing symptoms within 15 min and a mean time to death of about 
30min at saturating intravenous doses (Fig. la and Supplemen- 
tary Fig. 1f). For subsequent experiments we administered a sublethal 
intraperitoneal dose. FlaTox-treated mice rapidly developed 
diarrhoea; however, histological analysis (Supplementary Fig. 2 and 
data not shown) after 30 min revealed no signs of pathology. Instead, 
we found fluid accumulation in the peritoneal cavity and intestine, but 
not in the kidneys or lungs (Supplementary Fig. 3). This fluid was lost 
from the blood, as the percentage volume of red blood cells (haemato- 
crit) increased to 70-80% (normal 45-50%) within 30-40 min after 
FlaTox injection (Fig. 1b). As haematocrit rose, body temperature 
dropped, but with delayed kinetics (Fig. 1b). Haemoconcentration 
was the earliest pathological event that we detected after treatment 
with FlaTox, and is probably the primary cause of the ensuing circulat- 
ory collapse, hypothermia and death. 

We found that only the complete toxin, rather than individual sub- 
units or FlaTox(AAA), induced hypothermia and increased haemato- 
crit (Fig. 1c, dand Supplementary Fig. 4a), showing that TLR signalling 
activated by flagellin or bacterial contaminants is insufficient to cause 
pathology. Nirc4-‘~ mice were completely protected at all doses and 
time points tested (Fig. le, f and data not shown). Naip5 ‘~ and 
Casp1~'~ mice were completely protected in survival experiments 
(Supplementary Fig. 4b), but they showed a delayed haemoconcentra- 
tion and hypothermic response (Fig. le, f). The moderate response of 
Naip5 '~ mice may be due to recognition of flagellin by NAIP6 
(refs 2, 3). Mice lacking tumour necrosis factor (LT«/LTf/ TNFa‘) 
or IL-1 and IL-18 were as sensitive as wild-type (B6) mice (Fig. 1g, h 
and Supplementary Fig. 4b-d), ruling out an essential role for these 
cytokines in FlaTox-induced pathologies. 

To identify the cell type(s) that respond to FlaTox, we generated 
bone marrow chimaeras by using susceptible wild-type (B6) mice and 
resistant Nirc4-'~ mice. Nirc4-'~ mice reconstituted with wild-type 
bone marrow were completely susceptible to FlaTox (Fig. 2a, b). 
Wild-type mice reconstituted with Nirc4-'~ bone marrow also 
responded to FlaTox, but with delayed kinetics. These results suggest 
that at least two cell populations respond to FlaTox: first, radio- 
sensitive haematopoietic cells that are necessary and sufficient for 
the early response to FlaTox (0-30 min), and second, radio-resistant 
cells that respond after 30 min. Hereafter we focused on the early 
haematopoietic response (EHR). 

The EHR was intact in mice deficient in mast cells, lymphocytes and 
neutrophils (Supplementary Fig. 5). By contrast, wild-type mice 
depleted of macrophages by using clodronate-loaded liposomes were 
almost completely protected from the EHR (Fig. 2c, d). Similarly, 
depletion of CD11b* cells in FVB:CD11b-DTR mice conferred 
protection (Fig. 2e, f). Flow cytometric analysis revealed almost com- 
plete ablation of CD11b/F4-80" resident peritoneal macrophages by 
both treatments (Supplementary Fig. 6a), whereas depletion of splenic 
and lamina propria macrophages was only partial (clodronate) or not 
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Figure 1 | Systemic cytosolic delivery of flagellin in vivo induces NAIP5/ 
NLRC4-dependent but IL-1 and IL-18-independent vascular leakage. 
a-h, Mice (wild-type (B6) or indicated genotypes) were injected intraperitoneally 
with FlaTox or the indicated proteins (4ugg ' PA; 2 ugg | all others) and 
monitored for survival (a), haematocrit (b, d, f, h) or rectal temperature 

(b, ¢, e, g) at the indicated times. WT, wild type; RBC, red blood cells. The key in 
a indicates doses of LFn-FlaA (in pig g'). In b, squares show temperature, and 
triangles haematocrit. Ina, n = 5-14; inb, n = 3; in c—h, n = 4-7. Data shown are 
pooled from multiple experiments (a) or are representative of at least two 
independent experiments. Error bars in b, c, e and g denote s.e.m. Asterisk, 
P< 0.02; two asterisks, P< 0.009 (Mann-Whitney test). 


observed (CD11b-DTR) (Supplementary Fig. 6c, d). We therefore 
speculated that resident peritoneal macrophages might mediate 
responsiveness to FlaTox. Indeed, Nirc4~'~ mice injected with wild- 
type resident peritoneal cells became sensitive to FlaTox (Fig. 2g). By 
contrast, spleen or bone marrow cells did not transfer responsiveness; 
unexpectedly, neither did wild-type thioglycollate-elicited peritoneal 
macrophages or BMDMs (Fig. 2g and Supplementary Fig. 6e). These 
data demonstrate a unique and critical function for resident peritoneal 
macrophages in the inflammasome-dependent in vivo response to 
FlaTox. 

We speculated that peritoneal macrophages might initiate the EHR 
through the inflammasome-dependent secretion of a previously 
unidentified factor. Eicosanoids are paracrine signalling lipids that 
are crucial for the activation of inflammation and host defence, 
and can induce vascular permeability, vasodilation and leukocyte 
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Figure 2 | Resident peritoneal macrophages are critical for the early FlaTox 
response in vivo. a-f, Mice were injected intraperitoneally with FlaTox, and 
rectal temperature (a, c, e) or haematocrit (b, d, f) were measured after 30 min 
or at the indicated times. a, b, Bone marrow chimaeric mice (KO = Nirc4'"; 
n= 5). ¢,d, Macrophage-depleted wild-type (B6) mice (n = 3-6). e, f, CD1 1b* 
cell-depleted FVB:CD11b-DTR mice (n = 6-7). g, Nirc4_'~ host mice injected 
intraperitoneally with 10’ resident (Res.) or thioglycollate-elicited (Thio.) 
peritoneal cells or BMDMs of indicated genotype. Rectal temperature was 
measured 30 min after intraperitoneal injection of FlaTox (8 pg g ' PA + 
4ugg ' LFn-FlaA). Data shown are pooled from multiple experiments (g) or 
are representative of at least three independent experiments (a-f). Error bars in 
a denote s.e.m. Two asterisks, P< 0.01; three asterisks, P = 0.001 (Mann- 
Whitney test). 


chemotaxis'’*'°. Eicosanoids are synthesized when arachidonic 


acid released from cell membranes by phospholipases is converted 
into prostaglandins (PGs) and thromboxanes downstream of the 
cyclooxygenases (COX-1 and COX-2) or into hydroxyeicosatetraenoic 
acids (HETEs) and leukotrienes (LT's) downstream of the lipoxygenases 
(12/15-LOX and 5-LOX) (Supplementary Fig. 7). Direct intraperitoneal 
injection of prostaglandins leads to diarrhoea and fluid accumulation in 
the gut'”'*, both hallmarks of FlaTox-induced pathology. 

We therefore speculated that resident peritoneal macrophages, the 
specific cells required for the EHR (Fig. 2), produce eicosanoids in 
response to inflammasome activation. Using liquid chromatography- 
tandem mass spectrometry (LC-MS/MS) lipidomic analysis and 
enzyme immunoassay, we detected rapid biosynthesis of numerous 
COX-dependent and LOX-dependent eicosanoids including LTB,, 
PGE, and cysteinyl LTs in supernatants of peritoneal lavage cells treated 
ex vivo with FlaTox, whereas the FlaTox(AAA) mutant elicited a much 
weaker response (Fig. 3 and Supplementary Fig. 8a, b). Moreover, 
eicosanoid induction required NAIP5, NLRC4 and CASP1 (Fig. 3b 
and Supplementary Fig. 8c). LTB4/PGE, induction by PA or LFn- 
FlaA alone was equivalent to stimulation with FlaTox(AAA) or 
lipopolysaccharide, further demonstrating that TLR signalling cannot 
account for their rapid biosynthesis (Fig. 3c and Supplementary 
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Fig. 8b). The flagellated intracellular pathogen Salmonella enterica 
serovar Typhimurium (S. typhimurium) also elicited inflammasome- 
dependent eicosanoid biosynthesis in a similar manner to FlaTox 
(Fig. 3d and Supplementary Fig. 8d), demonstrating that this pathway 
is activated during live infection. 

Similarly to the ex vivo results, LC-MS/MS analysis of peritoneal 
lavage fluid from mice treated with FlaTox for 20 min revealed robust 
inflammasome-dependent eicosanoid biosynthesis in vivo (Sup- 
plementary Fig. 9). Some residual eicosanoid biosynthesis (particularly 
of PGE,) was observed after treatment with FlaTox(AAA). This 
residual response is probably due to bacterial contaminants (for 
example lipopolysaccharide) activating peritoneal cells other than 
macrophages, because macrophages did not produce PGE, in response 
to stimulation for 30 min with lipopolysaccharide or FlaTox(AAA) 
(Fig. 3c). However, because this residual response did not produce 
symptoms (Fig. 1c, d) and eicosanoids have only paracrine effects, 
the FlaTox-induced pathology is probably a localized response 
mediated by resident peritoneal macrophages and/or the synergistic 
effects of multiple eicosanoids. FlaTox did not result in the detectable 
production of anti-inflammatory lipoxins’’. Taken together, our results 
show that inflammasome activation results in an ‘eicosanoid storm’ ex 
vivo and in vivo, characterized by the broad biosynthesis of both LOX 
and COX products, with a bias towards pro-inflammatory lipids. 

How does inflammasome activation induce eicosanoid bio- 
synthesis? Although cells express several initiating A. phospholipases, 
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Figure 3 | Inflammasome-dependent eicosanoid biosynthesis. a, LC-MS/ 
MS-based lipidomics of wild-type (B6) BMDMs or resident peritoneal (IP) cells 
incubated for 30 min ex vivo with FlaTox (20 pgml~' PA + 10 ugml~* LFn- 
FlaA). b-d, PGE, immunoassay of resident peritoneal macrophages treated ex 
vivo with FlaTox, the indicated proteins (10 1g ml! PA ind;5 ug ml ? inall 
others), lipopolysaccharide (c, 1 pg ml ') or S. typhimurium (d, multiplicity of 
infection = 5) and incubated for 30 min (b, c) or 180 min (d). Data shown are 
representative of at least two (d) or three (a—c) independent experiments. Error 
bars in b-d denote s.e.m. Two asterisks, P < 0.01; three asterisks, P< 0.0005 
(Student’s t-test). Hash sign, not detected. 
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Ca**-dependent cytosolic phospholipase Ay (cPLA) accounts for 
nearly all eicosanoid biosynthesis in peritoneal macrophages treated 
with a Ca’* ionophore”’. Inhibition of cPLA; also blocked LTB, and 
PGE, production but not pyroptosis in response to FlaTox (Fig. 4a and 
Supplementary Fig. 8e). An increase in intracellular Ca7* is both 
necessary and sufficient for cPLA, activation”); notably, the earliest 
detectable CASP1-dependent events in BMDMs infected with 
S. typhimurium are the formation of plasma membrane pores and 
the influx of Ca?* (refs 22, 23). We observed that CASP1-dependent 
Ca’* influx, comparable in magnitude to an ionomycin control 
(Supplementary Fig. 10), preceded cell lysis (79 + 29 s before the onset 
of membrane blebbing) in resident peritoneal macrophages (Fig. 4b, c). 
This Ca** influx seems to be critical for eicosanoid biosynthesis but 
dispensable for pyroptosis in response to FlaTox, because the intracel- 
lular Ca?* chelator bis-(o-aminophenoxy)ethane-N,N,N’ ,N’ -tetra- 
acetic acid acetoxymethyl ester (BAPTA-AM) inhibited PGE, and 
LTB, production without blocking lactate dehydrogenase (LDH) 
release (Fig. 4d and Supplementary Fig. 8f). Macrophages treated with 
glycine to inhibit pyroptosis” still produced PGE, in response to 
FlaTox, whereas cells lysed by digitonin or HO, did not, further 
indicating that eicosanoid production and cell lysis are separable 
events (Supplementary Fig. 11). Taken together, our results suggest a 
model (Supplementary Fig. 12) in which CASP1 activation results in 
rapid Ca’* flux (Fig. 4b) that leads to cPLA, activation and down- 
stream eicosanoid biosynthesis. Identification of a CASP1 substrate 
required for Ca~* influx is an important, but technically challenging, 
area for future research because proteomic studies to identify novel 
substrates of CASP1 (refs 24-26) have so far yielded limited insight. 

Although cPLA, activity is regulated primarily by Ca*” influx, its 
activity can be enhanced by mitogen-activated protein kinase 
(MAPK)-dependent phosphorylation*'. We wondered whether TLR 
signalling activated by FlaTox (FlaA or bacterial contaminants) might 
enhance cPLA, activity through downstream MAPKs. Indeed, cPLA, 
in resident peritoneal cells was phosphorylated after treatment with 
FlaTox, and this was inflammasome-independent but Myd88/Trif- 
dependent (Fig. 4e). Accordingly, LTB, production in response to 
FlaTox was partly attenuated in Myd88/Trif ‘~ cells, and Myd88/ 
Trif '~ mice were partly protected in vivo (Supplementary Fig. 13). 
Thus, although inflammasome-dependent Ca?* flux is both sufficient 
and necessary for eicosanoid production in response to FlaTox, TLR 
signalling, which is expected to accompany natural infection, can 
synergize with inflammasome activation to produce maximal responses. 

We further investigated the basis for the cell-type specificity of 
inflammasome-dependent eicosanoid biosynthesis. Consistent with 
the inability of BMDMs to transfer responsiveness to FlaTox in vivo 
(Fig. 2g), we observed no eicosanoid production by LC-MS/MS in 
BMDMs treated with FlaTox for 30 min (Fig. 3a). Even at 2h after 
treatment, when most BMDMs have undergone pyroptosis, we could 
not detect PGE, or LTB, production (Fig. 4f, g and Supplementary Fig. 
8g, h). Coxl, Alox12/15 and Alox5, which encode key enzymes 
required for eicosanoid biosynthesis, are expressed at much higher 
levels (10-1,000-fold) in CD11b/F4-80" resident peritoneal macro- 
phages than in BMDMs or thioglycollate-elicited CD11b/F4-80™ cells 
(Fig. 4h-j). Resident peritoneal macrophages are therefore uniquely 
primed for eicosanoid responses. Most characterization of inflamma- 
somes has relied on BMDMs, which perhaps explains why a link to 
eicosanoid biosynthesis has not been described. We speculate that the 
primed state of resting peritoneal macrophages may be a general char- 
acteristic of resident macrophages guarding sites of pathogen entry; it 
will be important to further explore the lineage-specific regulation of 
inflammasome function in vivo. 

To test whether eicosanoids contribute to FlaTox-induced patho- 
logy, we injected B6;129P2.Cox1 ~/~ mice and littermate controls with 
FlaTox, and found that the EHR was significantly attenuated in these 
mice (P < 0.0007; Fig. 4k, 1) despite intact pyroptosis (Supplementary 
Fig. 14). As expected, given that FlaTox induces the broad biosynthesis 
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Figure 4 | Mechanism and in vivo role of eicosanoid production. a, d, PGE, 
immunoassay (30 min) or LDH assay (2h) on supernatants from wild-type 
(B6) resident peritoneal macrophages pretreated for 45 min with 
dimethylsulphoxide (DMSO), cPLA, inhibitor (a, 0.2 1M pyrrophenone) or 
BAPTA-AM (d, 10 LM) then FlaTox + DMSO/inhibitor. b, c, Wild-type (B6) 
or Casp1~‘~ resident peritoneal macrophages were treated with FlaTox or 
FlaTox(AAA) and Ca** indicator (Fluo-4; 2.5 mM) and fluorescence/ 
background (R/Ro) was quantified over time. b, Representative cell traces. 

c, Maximum R/Ro for each cell. e, Resident peritoneal cells of indicated 
genotype treated for 30 min as indicated. Cell lysates were probed for indicated 
proteins by western blot (WB). f, g, Resident peritoneal macrophages (f) or 
BMDMs (g) treated with FlaTox. PGE, (bars) and cell lysis (line) were 
measured over time as in a. h-j, Expression of Cox] (h), Alox12/15 (i) and Alox5 


of both COX-1-dependent and COX-1-independent eicosanoids, the 
role of COX-1 was masked at high doses of FlaTox or at later time 
points (data not shown), suggesting a contribution of COX-1- 
independent eicosanoids or other unidentified factors. In confirma- 
tion of the results with Cox1~/~ mice, chemical inhibition of COX-1 
protected mice from a low dose of FlaTox (Supplementary Fig. 15). 
Inhibition of the inducible enzyme COX-2 or the genetic deletion of 
Alox5 had no effect, although their contribution may be masked by 
functional redundancy of downstream eicosanoids (Supplemen- 
tary Fig. 15). These data link the production of COX-1-dependent 
eicosanoids to the pathology associated with in vivo delivery of 
FlaTox, which is consistent with earlier reports that purified prosta- 
glandins are sufficient to cause fluid accumulation and diarrhoea'”™. 

We speculated that other inflammasomes might also lead to 
eicosanoid production in vivo. For example, anthrax lethal toxin acti- 
vates the NLRP1B inflammasome”, resulting in a rapid (but transient 
and non-lethal) hypothermic response**. Because C57BL/6 mice 
express a non-functional allele of Nirp1b (Nirp1b’), we injected 
Cox1*'* mice expressing the sensitive 129 allele of Nirp1b (Nirp1b°). 
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(j) measured by quantitative RT-PCR in BMDMs and resident peritoneal 
macrophages or thioglycollate-elicited peritoneal cells (thio.) sorted for CD11b/ 
F4-80", k-n, Mice were injected intraperitoneally with FlaTox (k, 1) or 
intravenously with anthrax lethal toxin (m, n; 200 ug PA + 100 pg lethal 
factor); rectal temperature (k, m) and haematocrit (1, n) were measured after 
30 min (k, 1) or 45 min (m, n). k, 1, B6;129P2-Cox1~'~ mice and littermate 
controls. m,n, Cox-1~/~ mice and littermate controls expressing a lethal-toxin- 
sensitive (S) or resistant (R) Nirp1b allele. Data are pooled from multiple 
experiments (k-n) or are representative of at least two (b, c, h-j) or three 

(a, d-g) independent experiments. Error bars in a, d and f-j denote s.e. 
Asterisk, P < 0.04; two asterisks, P < 0.006; three asterisks, P < 0.0007 

(k-n, Mann-Whitney test; a, c, d, h-j, Student’s t-test). Hash sign, not detected. 


These mice developed diarrhoea, decreased body temperature and an 
inflammasome-dependent increase in haematocrit (Fig. 4m, n). As 
in FlaTox-treated mice, deletion of Cox1 attenuated these early 
pathologies of lethal toxin (Fig. 4m, n), indicating that this naturally 
occurring bacterial toxin activates a similar, though non-lethal, inflam- 
masome pathway in mice. 

Whereas many cellular immune responses require de novo tran- 
scription, the NAIP5/NLRC4 and NLRP1B inflammasomes assemble 
from preformed protein components to activate a proteolytic cascade. 
These inflammasomes are therefore ideally positioned to mediate 
rapid responses to infection. Initiated within minutes of flagellin detec- 
tion, the inflammasome-dependent eicosanoid production described 
here is one of the most rapid innate immune cellular responses known 
in vivo. Although performed in vivo, the results presented here rely on 
the systemic administration of toxins. It will be important to explore 
the role of inflammasome-dependent eicosanoids during live infec- 
tion. When restricted to the site of infection, such eicosanoids 
may have a beneficial role in host defence, for example by rapidly 
increasing local vascular permeability, allowing an influx of antibody, 


©2012 Macmillan Publishers Limited. All rights reserved 


complement and immune cells. Future studies should also evaluate a 
role for eicosanoids in other inflammasome-dependent phenotypes 
previously described in vivo. Indeed, our results suggest that the sig- 
nalling outputs of inflammasomes may be much broader than has 
previously been realized. 


METHODS SUMMARY 

Toxin delivery and pathology. Recombinant proteins (PA, LFn-FlaA, LFn- 
FlaA(AAA) and LF) were purified from Escherichia coli as described previously”, 
and endotoxin was removed with Detoxi-Gel (Pierce). LFn-FlaA(AAA) was 
generated by mutating the three carboxy-terminal leucines of L. pneumophila 
flagellin to alanine’. Unless otherwise noted, standard toxin doses were 21g ¢ 
body weight of LFn-FlaA in vivo (200 ul intraperitoneally) and 5 pg ml’ LEn- 
FlaA in vitro. PA dose was always 2 * LFn-FlaA. Rectal temperature was measured 
with a MicroTherma 2T thermometer (Braintree Scientific). Blood for haematocrit 
was collected by retro-orbital bleed into StatSpin microhaematocrit tubes (Fisher 
Scientific). 

Mice. For bone marrow chimaeras, mice were irradiated twice with 600 rad 4h 
apart, injected with 5X 10° donor cells, and analysed after 12-15 weeks. 
Macrophages were depleted for 48h with liposome-encapsulated clodronate 
(N.v.R) (500 pil intraperitoneally and 200 pl intravenously). CD11b* cells were 
depleted for 24h in FVB-Tg(CD11b/EGFP)34Lan/] mice with 25 ngg ' body 
weight (intraperitoneally) of diphtheria toxin (Sigma-Aldrich). 

Lipidomics. Lipid autacoids were extracted by solid phase with SampliQ ODS-C; 
cartridges (Agilent Technologies). Eicosanoids and docosanoids were identified 
and quantified with a triple-quadrupole linear ion-trap LC-MS/MS system (MDS 
SCIEX 3200 QTRAP) equipped with a Kinetex C,, mini-bore column. 
Statistical analysis. Statistical differences were calculated with an unpaired two- 
tailed Student’s t-test (in vitro/ex vivo) or two-tailed non-parametric Mann- 
Whitney test (in vivo) using GraphPad Prism 4.0b. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Mice and cell culture. Except bone marrow chimaeras (see below), all mice were 
sex and age-matched at 5-8 weeks old. C57BL/6J (B6) and Bo.cKit’"”"""" mice 
were purchased from Jackson Laboratories. B6;129P2-Ptgs 1°77" (Cox1) mice 
were purchased from Taconic. B6.NIrc4'~ mice* were from S. Mariathasan 
and V. Dixit. B6.Casp1 ‘~ mice! were a gift from A. Van der Velden and 
M. Starnbach. B6.MyD88/Trif '~ and FVB-Tg(CD11b/EGFP)34Lan/J were a gift 
from G. Barton. B6.Rag! ‘~ mice were a gift from D. Raulet. B6.JI1b/I118 ‘~ mice 
were a gift from D. Portnoy. B6.Alox5 /~ mice were provided by C. Brown. 
B6.Naip5 '~ mice were described previously‘. For bone marrow chimaeras, mice 
were irradiated twice with 600 rad 4h apart and reconstituted with 5 X 10° donor 
cells by injection into the tail vein. Chimaeric mice were assayed 12-15 weeks after 
irradiation. Bone marrow macrophages were differentiated from bone-marrow- 
derived precursor cells by using macrophage colony stimulating factor as 
described previously’. For thioglycollate-elicited macrophages, mice were injected 
intraperitoneally with 2 ml of 4% aged thioglycollate medium (BD Diagnostics) 4 
days before peritoneal lavage. Cell lysis was measured by LDH release assay 
(Promega) in accordance with the manufacturer’s protocol. Animal experiments 
were approved by the Animal Care and Use Committee of the National Institute of 
Allergy and Infectious Diseases, National Institutes of Health (Fig. la and 
Supplementary Fig. 1f) and the University of California, Berkeley Animal Care 
and Use Committee (all other figures). 

FlaTox injections and pathology. Recombinant proteins (PA, LFn—-FlaA, LFn- 
FlaA(AAA) and LF) were purified from E. coli as described previously”’. 
Endotoxin was removed from these proteins with Detoxi-Gel (Pierce) in accord- 
ance with the manufacturer's protocol. LFn-FlaA(AAA) was generated by 
mutating the three carboxy-terminal leucines of L. pneumophila flagellin to 
alanine’. Toxin was injected intraperitoneally or intravenously (tail vein) in 
200 pl of PBS. Unless otherwise noted, standard doses were 2 ug g_* body weight 
of LEn-FlaA in vivo (intraperitoneally) and 5 ug ml‘ LFn-FlaA in vitro. PA dose 
was always 2 X LFn-FlaA. Rectal temperature was measured with a MicroTherma 
2T thermometer (Braintree Scientific) with a lubricated RET-3 probe. For 
haematocrit measurement, mice were anaesthetized briefly with isoflurane and 
blood was collected in a heparinized StatSpin 40-mm tube (Fisher Scientific) by 
retro-orbital bleed. The tube was sealed at one end with StatSpin sealant (Fisher 
Scientific) and centrifuged for 10 min at 9,000g. The percentage of red blood cells 
was quantified with a StatSpin card haematocrit reader. 

Fluid loss. Peritoneal fluid was collected with a 1-ml insulin syringe (BD 
Biosciences) and quantified by weight. Intestines (small intestine plus caecum plus 
colon) were harvested and immediately weighed. Harvested tissues were then 
dried uncovered overnight at 37 °C and weighed again. Fluid weight was calculated 
as wet weight minus dry weight. 

Cell depletion and transfer. Neutrophils were depleted by injecting B6 mice with 
200 1g (intraperitoneal; 36 h before treatment) and 150 1g (intravenous; 6 h before 
treatment) of RB6-8C5 antibody (anti-GR1; gift from D. Portnoy*”). Macrophages 
were depleted by injecting B6 mice intraperitoneally with 500 jl and intravenously 
with 200 tl of liposome-encapsulated clodronate (N.v.R) 48 h before FlaTox treat- 
ment. Clodronate liposomes were prepared as described previously. CD11b~ 
cells were depleted by injecting FVB:CD11b-DTR mice intraperitoneally with 25 
ngg | body weight of diphtheria toxin (Sigma-Aldrich) 24h before FlaTox treat- 
ment. 

For macrophage cell transfer (Fig. 2g), peritoneal cells were lavaged from 
naive or thioglycollate-treated mice and BMDMs were derived in culture. 
About 10’ macrophages (estimated by counting large cells in lavage) were injected 
intraperitoneally into host mice in 500 ul of PBS. After 2 h, mice were injected with 
FlaTox and monitored for changes in rectal temperature. For spleen transfer, a 
single-cell suspension was generated by pushing spleen through mesh followed by 
osmotic lysis of red blood cells. Bone marrow cells were collected from femurs and 
tibias, followed by osmotic lysis of red blood cells. Entire spleen or bone marrow 
from one mouse was transferred into one host by intraperitoneal injection and 
analysed as above. 

Eicosanoid analysis. For ex vivo analysis, total resident peritoneal cells were 
lavaged from euthanized mice and macrophage numbers estimated by counting 
only large cells. Macrophages (10°-2 X 10°; (2-4) X 10° total cells) were aliquoted 
to 1.5-ml Eppendorf tubes. Toxin treatments were performed in 1-ml prewarmed 
Dulbecco’s PBS with Ca** and Mg** (Invitrogen). After 30min, cells were 
pelleted by centrifugation and the supernatant was immediately transferred to 
2 mlof cold methanol for storage at — 80 °C. For in vivo analysis, mice were injected 
intraperitoneally with FlaTox. After 20min, mice were euthanized and the 
peritoneum was lavaged with Iml of cold PBS, which was immediately 
transferred to 2ml of cold methanol for storage at —80°C. Before analysis, 
400 pg each of the deuterated internal standards prostaglandin E, (PGE,-d4), 
15(S)-hydroxyeicosatetraenoic acid (15(S)-HETE-d8) and LTB,-d4 were added 


to each sample to calculate the recovery of different classes of oxygenated fatty acid. 
Lipid autacoids were extracted by solid phase with SampliQ ODS-C)s cartridges 
(Agilent Technologies). Eicosanoids and docosanoids were identified and quan- 
tified by LC-MS/MS-based lipidomics**~””. In brief, we analysed extracted samples 
by a triple-quadrupole linear ion-trap LC-MS/MS system (MDS SCIEX 3200 
QTRAP) equipped with a Kinetex C;s mini-bore column. The mobile phase was 
a gradient of A (water/acetonitrile/acetic acid (72:28:0.01 by vol.)) and B (propan- 
2-ol/acetonitrile (60:40, v/v)) with a flow rate of 450 ul min” !. MS/MS analyses 
were performed in negative-ion mode, and prominent fatty acid metabolites were 
quantified by multiple reaction monitoring (MRM mode) using established tran- 
sitions***” for PGE,/PGD» (351-271, 351189 m/z), TXB, (369-169 m/z), 
PGF, (353-3193 m/z), 5-HETE (319-115 m/z), 12-HETE (319179 m/z), 
15-HETE (319-3175 m/z), 5,12-DiHETE/LTB, (3352195 m/z), LXAg 
(3513115 m/z), PGE>-d4 (355275 m/z), LTB,-d4 (339-+197 m/z), and 15- 
HETE-d8 (327-182 m/z). Calibration curves (1-1,000 pg) and specific LC reten- 
tion times for each compound were established with synthetic standards (Cayman 
Chemical). Structures were confirmed for selected autacoids by MS/MS analyses 
using enhanced product ion mode with appropriate selection of the parent ion in 
quadrupole 1. 

For enzyme immunoassay (EIA) of LTB,, PGE, and cysteinyl LTs, total resident 
peritoneal cells were collected by lavage as above. Macrophages (2 X 10°) were 
seeded into 96-well plates and incubated for 4h in RPMI buffer containing 10% 
FBS, 100 Uml ! penicillin, 100 pg ml! streptomycin, 2 mM L-glutamine. Before 
assay, cells were washed once with PBS to select for adherent macrophages. Cells 
were treated for 30 min in 100 ull of PBS with Ca?* and Mg**. PGE), LTB, and 
cysteinyl LT’s in supernatants were measured by EIA (Cayman Chemical). 
Salmonella infections. Resident peritoneal macrophages were selected from total 
peritoneal lavage by plating overnight on Petri dishes followed by rinsing with PBS 
before replating into 96-well plates (2 X 10° cells per well). Salmonella enterica 
serovar Typhimurium strain LT2 was grown overnight in 5 ml of Luria-Bertani 
medium in a shaking incubator at 37 °C. The next morning, the cultures were diluted 
1:66 in Luria-Bertani medium and grown for 3 h (standing culture, 37 °C). Bacteria 
were added to cells at a multiplicity of infection of 5 in PBS with Ca”* and Mg", 
followed by centrifugation at 400g for 10 min. After 3 h, eicosanoid production and 
cell lysis were measured by EIA and LDH assay, respectively, as described above. 
Flow cytometry. Leukocytes were collected from the peritoneal cavity by lavage 
with 7 ml of PBS, from spleen passed through a nylon mesh (BD Falcon) to 
establish a single-cell suspension, or from the lamina propria as described previ- 
ously**. Cells were stained with anti-F4-80-APC (BM8; 1:200 dilution; 
eBiosciences), anti-CD11b-PE (M1/70; 1:400 dilution; eBiosciences) and anti- 
CD11c-PE-Cy7 (N418; 1:100 dilution; eBiosciences) and analysed with standard 
flow cytometry protocols. F4-80/CD11b™ cells were isolated by fluorescence- 
activated cell sorting. 

Quantitative RT-PCR. CD11b/F4-80™ cells were sorted from total peritoneal 
cells lavaged from naive or thioglycollate-injected (2 ml injected intraperitoneally 
4 days in advance) mice. Bone-marrow-derived macrophages are more than 95% 
CD11b/F4-80™ and were used without sorting. RNA was isolated with the RNeasy 
kit (Qiagen) in accordance with the manufacturer’s protocol. RNA samples were 
treated with RQ] DNase (Promega) before reverse transcription with Superscript 
III (Invitrogen). Complementary cDNA reactions were primed with poly(dT) for 
the measurement of mature transcripts. Quantitative PCR was performed as 
described” using the Step One Plus RT PCR System (Applied Biosystems) with 
Platinum Taq DNA polymerase (Invitrogen) and EvaGreen (Biotium). Transcript 
levels were normalized to Rps17. The following primers were used in this study: 
Rps17 sense, 5'-CGCCATTATCCCCAGCAAG-3’; Rps17 antisense, 5'-TGTCG 
GGATCCACCTCAATG-3'; Ptgs1(Cox1) sense, 5'-ATGAGTCGAAGGAGTC 
TCTCG-3'; Ptgs1(Cox1) antisense, 5'-GCACGGATAGTAACAACAGGGA-3’; 
Alox5 sense, 5’-ACTACATCTACCTCAGCCTCATT-3’; Alox5 antisense, 
5'-GGTGACATCGTAGGAGTCCAC-3’; Alox12/15 sense, 5’-GGCTCCAA 
CAACGAGGTCTAC-3’;  Alox12/15 antisense, 5’-AGGTATTCTGACACA 
TCCACCTT-3’. 

Western blotting. Secreted proteins from FlaTox-treated cells were collected and 
probed with anti-CASP1 p10 (sc-514; Santa Cruz) as described previously*. Cell 
lysates from 2 X 10° resident peritoneal cells were probed with anti-cPLA (2832; 
Cell Signaling), anti-phospho-cPLA, (2831; Cell Signaling) and anti-f-actin 
(sc-47778; Santa Cruz). 

Calcium flux. Resident peritoneal macrophages were selected from total peritoneal 
lavage by plating overnight on untreated Petri dishes followed by one rinse with PBS 
and replating onto eight-chamber glass slides (Thermo Scientific) coated with 
poly(p-lysine) (Sigma-Aldrich). After incubation overnight, cells were incubated 
for 45min at 37°C with 2.5mM Fluo-4 (Invitrogen) plus 0.02% pluronic 
(Invitrogen) in Ringer’s buffer containing 2mM Ca’”. Cells were washed twice 
and incubated for 45 min in Ringer’s buffer at 37 °C before transfer to microscope 
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for imaging. Cells were maintained at 37°C with CO, throughout imaging. 
Fluorescent (470 nm/520 nm) and differential interference contrast images were 
collected every 10s on a Nikon Eclipse TE 2000-E microscope and analysed with 
NIS Elements AR 3.2 software. At least 30 cells were analysed for each replicate. 
Onset of cell lysis was calculated as the earliest time point at which membrane 
blebbing was visible by differential interference contrast imaging. 
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Paramutation in Drosophila linked to emergence of a 


piRNA-producing locus 


Augustin de Vanssay't, Anne-Laure Bougé*+, Antoine Boivin', Catherine Hermant', Laure Teysset', Valérie Delmarre'’, 


Christophe Antoniewski?} & Stéphane Ronsseray’ 


A paramutation is an epigenetic interaction between two alleles of 
a locus, through which one allele induces a heritable modification 
in the other allele without modifying the DNA sequence’”. The 
paramutated allele itself becomes paramutagenic, that is, capable 
of epigenetically converting a new paramutable allele. Here we 
describe a case of paramutation in animals showing long-term 
transmission over generations. We previously characterized a 
homology-dependent silencing mechanism referred to as the 
trans-silencing effect (TSE), involved in P-transposable-element 
repression in the germ line*°. We now show that clusters of 
P-element-derived transgenes that induce strong TSE®”’ can convert 
other homologous transgene clusters incapable of TSE into strong 
silencers, which transmit the acquired silencing capacity through 50 
generations. The paramutation occurs without any need for chro- 
mosome pairing between the paramutagenic and the paramutated 
loci, and is mediated by maternal inheritance of cytoplasm carrying 
Piwi-interacting RNAs (piRNAs) homologous to the transgenes. 
The repression capacity of the paramutated locus is abolished by a 
loss-of-function mutation of the aubergine gene involved in piRNA 
biogenesis, but not by a loss-of-function mutation of the Dicer-2 
gene involved in siRNA production. The paramutated cluster, previ- 
ously producing barely detectable levels of piRNAs, is converted into 
a stable, strong piRNA-producing locus by the paramutation and 
becomes fully paramutagenic itself. Our work provides a genetic 
model for the emergence of piRNA loci, as well as for RNA-mediated 
trans-generational repression of transposable elements. 
Paramutations have been well described in plants'’**"”. The best char- 
acterized is the b1 paramutation in maize, which involves a small RNA 
silencing pathway'*"'*, changes in DNA methylation levels and chro- 
matin modifications’, and shows full penetrance and stability across 
generations. Paramutation-like phenomena involving microRNAs have 
been described in mice’”"*. However, long-term inheritance of a para- 
mutation through generations has not been reported so far in animals. 
In Drosophila melanogaster, transposition of P elements causes 
hybrid dysgenesis, a syndrome of genetic abnormalities including a 
high mutation rate, chromosome rearrangements and sterility’?’®. In 
natural populations, telomeric P elements inserted in heterochromatic 
telomere-associated sequences (TAS) are master sites for establishing 
P-element repression in the germ line*’*’. In laboratory lines (for 
example, P-1152), P-lacZ transgenes inserted in TAS mimics telomeric 
P elements by repressing germline expression of reporter transgenes 
inserted at distant euchromatic sites, through a homology-dependent 
silencing mechanism, TSE***. TSE is strongly sensitive to mutations 
affecting the piRNA pathway’. Its establishment involves both 
genetic and epigenetic components: a chromosomal copy of the 
telomeric silencer transgene must be either paternally or maternally 
inherited, and a cytoplasmic component containing small RNAs 
homologous to the transgene must be maternally inherited*’, In 


addition to telomeric loci, we found that T-1, a tandem repeat cluster 
of P-lacZ transgenes inserted in the middle of chromosome arm 2R 
(50C), can also trigger a strong TSE’. T-1 and other P-lacZ clusters 
inserted at the same locus (Supplementary Fig. 1) induce ectopic 
heterochromatin and show variegation of the white gene marker in 
the eye, a phenomenon termed repeat-induced gene silencing®”®. 
However, T-1 triggers strong silencing of various TSE reporter trans- 
genes in the germ line’, whereas the other transgene clusters at this 
locus, including BX2, which contains the same number of transgene 
repeats as T-1, did not induce detectable TSE (Supplementary Table 1). 

The epigenetic properties of T-1 were analysed together with those 
of the P-1152 telomeric silencer and the BX2 cluster as controls. T-1 
and P-1152 showed typical maternal transmission of TSE: strong 
repression occurred in the germ line of progeny when the silencer 
was maternally inherited (Fig. 1a), whereas weak or null repression 
was detected when the silencer was paternally inherited (Fig. 1b). BX2 
showed no repression capacity in these crosses. To analyse the rela- 
tionship between TSE and piRNAs, we sequenced 19-29-nucleotide 
RNAs from ovaries of T-1, P-1152 or BX2 females (Supplementary 
Table 2). Abundant small RNAs matched the T-1 sequences in the 
library from hemizygous females having inherited the T-1 locus 
maternally (Fig. 1c), but not paternally (Fig. 1d). Among these species, 
the 23-28-nucleotide RNAs showed the typical ‘ping-pong’ signature 
of piRNA biogenesis”, including a bias for a 5’ U (1U) and a strong 
tendency to form sense-antisense pairs with complementarity over 
their first ten nucleotides (Supplementary Fig. 2). In addition to 
piRNAs, short interfering RNAs (siRNAs) have been shown to be 
produced by previously characterized piRNA loci’*. Similarly, T-1 
produced a significant fraction of 21-nucleotide RNAs (Fig. 1c) that 
do not show the ping-pong signature of piRNAs and probably corre- 
spond to siRNAs (Supplementary Fig. 3a). In agreement with a pre- 
vious report”, small RNAs with similar features were produced by 
P-1152 in hemizygous females having inherited the P-1152 locus 
maternally (Fig. 1f). Homozygous P-1152 females produced about 
twice as many piRNAs as these hemizygous females (Supplementary 
Fig. 4). Finally, only a very low level of small RNAs was produced that 
matched BX2 in hemizygous females from the BX2 line (Fig. le). 
Hence, maternal inheritance of T-1, as well as P-1152, is associated 
with both the production of piRNAs derived from these loci and the 
capacity of these loci to mediate TSE, thereby linking silencing and 
piRNAs in this system. We next tested epigenetic interactions between 
the P-1152 telomeric silencer and T-1, and found that chromosomal 
and maternally transmitted components of T-1 and P-1152 can 
complement each other to induce TSE (Supplementary Fig. 5), con- 
sistent with the presence of piRNAs matching P-lacZ sequences in 
ovaries of both T-1 and P-1152 females. 

To investigate possible transfer of epigenetic information between 
T-1 and the inactive BX2 locus, we crossed hemizygous T-1 females 
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Figure 1 | Maternal inheritance of P-1152 and T-1 repression capacities 
correlates with the presence of T-1- or P-1152-derived piRNAs in ovaries of 
female progeny. a, b, Maternal (a) and paternal (b) inheritance of TSE 
mediated by P-1152, T-1 or BX2 was tested using the P-1039 TSE reporter 
transgene. lacZ staining of ovaries of G, females from the indicated crosses was 
performed, and TSE was expressed as the percentage of repressed egg chambers 
among the total number (1) of egg chambers analysed. Female and male 
Canton’ flies are devoid of any transgene and were used as controls. Note that 
lacZ staining of follicle cells surrounding egg chambers (shown at higher 


with hemizygous BX2 males, and recovered female progeny that had 
not inherited the T-1 locus and carrying a paternally inherited BX2 
locus (Fig. 2). These females showed marked silencing of the TSE 
reporter transgene, indicating that the cytoplasm of T-1 oocytes can 
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Figure 2 | Epigenetic induction of BX2 by T-1. a, T-1 females carrying the 
TSE reporter transgene P-1039 were crossed to BX2 males carrying a balancer 
chromosome (Bal). BX2* female progeny having inherited cytoplasm from T-1 
mothers (orange background) and a BX2 chromosome from fathers were 
stained for lacZ. b, Females carrying only the TSE reporter P-1039 were crossed 
to BX2 males. Female progeny from this cross were stained for lacZ. c, P-1039/ 
BX2* female progeny from the cross in a showed complete TSE, which was 
scored as indicated in Fig. 1. P-1039/BX2 female progeny from the cross in b did 
not show TSE. Controls correspond to crosses between Canton” (devoid of any 
transgene) or T-1 females with P-1039 males, which resulted in progeny 
showing null and complete TSE, respectively. Original magnification, 20. 
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magnification in insets) is observed in all ovaries because TSE only occurs in the 
germ line’. Original magnification, 20. c-f, Deep sequencing of small RNAs 
from ovaries of the indicated genotypes in which the maternally inherited allele 
is always indicated first. Plots show the abundance of 19-30-nucleotide (nt) 
small RNAs matching P{lacW} (c-e) or P{lArB} (f). Histograms show the 
length distributions of small RNAs matching P{lacW} or P{lArB} (dark bars), or 
only the lacZ sequence in these elements (blue bars). Positive and negative 
values correspond to sense and antisense reads, respectively. 


confer new silencing capacities to the inactive allele of the BX2 locus. 
This de novo silencing allele will be hereafter referred to as BX2* to 
differentiate it from the initial BX2 allele never having been exposed to 
a T-1 cytoplasm. 

A BX2* line was established and analysed in successive generations 
(Fig. 3a). Notably, second generation (G) BX2* females from test 
crosses with males carrying a TSE reporter transgene still showed a 
complete TSE (Fig. 3b). This capacity to mediate TSE was fully main- 
tained over 25 generations of the BX2* line (TSE = 100%, n = 4,600). 
TSE remained very strong between G3, and Gss (99.4%, n = 22,700) 
showing a reversion rate less than 0.5% per generation at 25°C 
(Supplementary Discussion). We conclude that maternally inherited 
factors from the T-1 strain stably paramutated the BX2 locus. 

In contrast to BX2 females, ovaries of G, BX2* females contained 
abundant small RNAs matching the BX2 sequence (Fig. 3c and 
Supplementary Table 2) with a profile similar to the one observed in 
T-1 females (see Fig. 1c). The size distribution of these small RNAs 
showed a large peak corresponding to 23-28-nucleotide small RNAs 
with the piRNA ping-pong signature (Supplementary Fig. 2), as well as 
a discrete peak corresponding to a 21-nucleotide siRNA-like species of 
RNAs. Therefore, the acquired capacity of the BX2* allele to mediate 
TSE correlates with the de novo production of lacZ-derived small 
RNAs from this locus. Finally, BX2*-derived small RNAs were con- 
tinuously produced in ovaries over at least 42 generations of a BX2* 
line (Fig. 3d and Supplementary Figs 2 and 3). Together, these data 
indicate that the BX2* paramutation is associated with stable produc- 
tion of high levels of small RNAs from the BX2 locus in ovaries. 

We next tested whether the paramutated BX2* allele is paramuta- 
genic. We crossed hemizygous BX2* females with hemizygous 
naive BX2 males and recovered female progeny having inherited the 
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Figure 3 | BX2* paramutation occurs and is 
associated to the production of small RNAs by 
the BX2 cluster. a, BX2* lines were established as 
indicated. Ball and Bal2 are balancer chromosomes 
carrying distinct phenotypic markers. BX2* siblings 
were crossed at each generation to perpetuate the 
| BX2* line. In addition, BX2* females were crossed 
TSE = 0.0% at various generations (G,,) to males carrying the 
n= 1,800 P-1039 reporter, to score the TSE of BX2* in the 
G,+1 female progeny. b, TSE in BX2* females from 
generations G, and G);, and in progeny of crosses 
from Canton’, T-1 and BX2 females with P-1039 
males as controls. /!BX2g)s indicates that BX2 
females inherited cytoplasm from T-1 females 25 
generations before the present cross. TSE was 
scored as indicated in Fig. 1. Original magnification, 
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cytoplasm of BX2* mothers and the BX2 locus from fathers (Fig. 4a). 
This BX2 allele was then assessed in generation G) for its capacity to 
silence a TSE reporter transgene in the germline. Notably, we observed 
a complete TSE (Fig. 4a), indicating that the paternally inherited BX2 
allele was paramutated through maternal inheritance of BX2* cyto- 
plasm. This newly paramutated BX2 allele, which corresponds to a 
second-order paramutation, will be hereafter referred to as BX2*7. A 
BX2*? line was established and showed stable TSE over 36 generations 
(Fig. 4a). Moreover, this line retained the capacity to produce large 
amounts of BX2*?-derived small RNAs after 36 generations (Fig. 4b). 
Following an identical mating scheme, BX2*? females were able to 
paramutate a paternally inherited BX2 locus, generating a third- 
order BX2*° paramutated allele that showed full TSE capacity over 
10 generations. Applying this procedure recurrently, we generated a 


Length (nt) 


fifth-order paramutated BX2*° allele that showed full TSE capacity 
(Supplementary Fig. 6). In conclusion, the conversion of BX2 to 
BX2* by T-1 maternal cytoplasm has all the properties of a paramuta- 
tion, because it is stable over generations and the paramutated allele 
shows secondary paramutagenicity. 

Interestingly, T-1 also fully paramutated C2, another seven-copy 
transgene inserted at the same location (Supplementary Fig. 1), 
whereas lower-copy-number transgenes at this location were paramu- 
tated only transiently (Supplementary Table 3). A similar unstable 
paramutation interaction was also observed between the non-allelic 
P-1152 and BX2 loci (Supplementary Fig. 7). 

As paramutation in this system is correlated with the production of 
BX2*-derived piRNAs and siRNAs, we investigated the effect of 
aubergine and Dicer-2 loss of function on a paramutated BX2 cluster. 
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Figure 4 | Paramutated BX2* is paramutagenic. a, BX2* females were 
crossed with BX2 males and a BX2*? line (second-order paramutation) was 
established as indicated. Ball and Bal2 are balancer chromosomes. BX2*? 
siblings were crossed at various generations to perpetuate the BX2*? line. In 
addition, BX2*° females were crossed at each generation (G,,) with males 
carrying the P-1039 reporter transgene to score the TSE of BX2*? in the G,+, 
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female progeny. b, Abundance (graph on the left) and length distribution (black 
histogram in the middle) of 19-30-nucleotide small RNAs matching the 
P{lacW} transgene in ovaries from hemizygous BX2*? females from generation 
G3. Length distribution of the subset of small RNAs only matching lacZ is 
shown in the blue histogram on the right. 
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The silencing capacity of the BX2*? cluster was completely abolished 
in homozygous aubergine mutants, whereas strong silencing still took 
place in Dicer-2 homozygous mutants (Supplementary Fig. 8). 
Moreover, the BX2*? locus still showed full repression capacity after 
four generations in a Dicer-2 homozygous mutant context. Hence, the 
BX2* silencing activity requires piRNAs, whereas neither BX2* activ- 
ity nor inheritance rely on siRNAs. In maize, paramutation can be 
induced by a non-allelic transgene producing bl-repeat double- 
stranded RNA (dsRNA) and siRNAs’ and epigenetic inheritance of 
the Kit'™'“" mutant allele in mice seems to result from paternal as well 
as maternal transmission of small RNAs’’. These data indicate that 
paramutations may in some instances involve small RNAs without 
interactions between alleles at the DNA or chromatin levels. Our find- 
ings that, in Drosophila, the BX2 paramutation is triggered by cyto- 
plasmic inheritance strongly support this view. 

Finally, we investigated the effect of the paramutation on transcrip- 
tion of the BX2 locus by quantitative polymerase chain reaction with 
reverse transcription (RT-qPCR). BX2 and BX2* showed similar 
steady-state levels of both sense and antisense transcripts (Supplemen- 
tary Fig. 9). This observation suggests that paramutation, rather than 
increasing the pool of piRNA precursor transcripts, activates their 
downstream processing into piRNAs. Thus, the maternally transmitted 
piRNAs could trigger production of primary piRNAs and/or ping-pong 
amplification of secondary piRNAs in the nuage. As paramutation is 
accompanied by de novo production of high levels of piRNA, it provides 
an invaluable model to determine the molecular events involved in the 
genesis of piRNA loci. 


METHODS SUMMARY 


All crosses were performed at 25 °C. lacZ expression assays were carried out using 
X-gal overnight staining”. The P-lacZ-white construct (named P{lacW}) contains 
the P-lacZ translational fusion and is marked by the mini-white gene 
(Supplementary Fig. 1 and Supplementary Table 4). Small RNA libraries from 
hand-dissected ovaries were prepared using the Illumina kit and sequenced using 
an Illumina Genome Analyzer II or an Illumina HiSeq-2000, following the 
manufacturer’s instructions. For library comparisons, read counts were normalized 
to the total number of small RNAs that matched the D. melanogaster genome and 
did not correspond to abundant cellular RNAs (ribosomal RNA, transfer RNA and 
small nucleolar RNAs). Overlap signatures were computed for each sequence data 
set by collecting the appropriate RNA reads matching P transgenes and calculating 
overlap frequencies with RNA reads on the opposite strand. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Experimental conditions. All crosses were performed at 25 °C and involved 3-5 
couples in most cases. lacZ expression assays were carried out using X-gal over- 
night staining as described previously”, except that ovaries were fixed for 6 min. 
Transgenes and strains. P-lacZ fusion enhancer trap transgenes P-1152, BQI6, 
BC69 and P-1039 all contain an in-frame translational fusion of the Escherichia coli 
lacZ gene to the second exon of the P transposase gene and a rosy transformation 
marker*’. The P-1152 insertion (Supplementary Table 4) was mapped to the 
telomere of the X chromosome (cytological site 1A) and consists of two P-lacZ 
insertions in the same TAS unit and in the same orientation’. P-1152 is homo- 
zygous, viable and fertile. BQ16 is located at 64C in euchromatin of the third 
chromosome* (Supplementary Table 4) and is homozygous, viable and fertile. 
BC69 is inserted in chromosome 2 (Supplementary Table 4) in the first exon of 
the vasa gene and results in a vasa loss-of-function allele; consequently, it is 
homozygous, female and sterile. P-1039 is located at 60B on the second chro- 
mosome (Supplementary Table 4) and is homozygous lethal. P-1152 shows no 
lacZ expression in the ovary, BQ16 and BC69 are strongly expressed in the nurse 
cells and in the oocyte and P-1039 shows strong /acZ staining in numerous tissues 
including the follicle cells, the nurse cells and the oocyte. 

P-lacZ clusters. Lines with different numbers of P-lacZ-white transgenes* located 
at cytological site 50C on the second chromosome®”® were used (Supplementary 
Table 4). The transgene(s) insertion site is located near the mRpL53 gene, in an 
Ago] intron. This site is not a piRNA-producing locus, as observed for instance in 
the deep-sequencing data set from P-1152 ovaries (data not shown). The P-lacZ- 
white construct contains the P-lacZ translational fusion and is marked by the mini- 
white gene (P{lacW}, FBtp0000204). BX2 carries seven P-lacZ copies including at 
least one defective copy inserted in direct orientations. T-1 derives from BX2 
following X-ray treatments (Supplementary Fig. 1). T-1 has chromosomal 
rearrangements including translocations between the second and the third chro- 
mosomes. After overnight staining, weak lacZ expression is detected in the follicle 
cells of BX2 and T-1 female ovaries, presumably because of a position effect at 50C, 
but no staining is observed in the germ line (data not shown). 

Lines carrying transgenes have M genetic backgrounds (devoid of P transposable 
elements), as do the multi-marked balancer stocks used in genetic experiments. The 
Canton’ and w'!!* lines were used as controls completely devoid of any P element 
or transgene. Crosses involving P-1152 were performed with females carrying the 
telomeric transgenes in the homozygous state (except where indicated), whereas 
crosses performed with BX2 or T-1 were performed with females carrying the 
cluster in the heterozygous state (referred to as hemizygous in case of insertions) 
because of the sterility (BX2) and lethality (T-1) induced by transgene clusters. 

Two strong hypomorphic mutant alleles of aubergine induced by EMS were 
used. Both of them are homozygous, female and sterile, and TSE was previously 
shown to be abolished by a heteroallelic combination of these alleles®. aubQ-? 
comes from the Bloomington Stock Center (stock no. 4968) and has not been 
characterized at the molecular level*?. aubN! has a 154-bp deletion, resulting in a 
frameshift which is predicted to add 16 novel amino acids after residue 740 
(refs 34, 35). Dicer-2'®!!* is a loss-of-function allele induced by EMS that has a 
sequence variant at residue 811 resulting in a stop codon”. It is homozygous, 
viable and fertile. 

Quantification of TSE. When TSE is incomplete, variegation is observed because 
‘on/off lacZ expression is seen between egg chambers: that is, egg chambers can 
show strong expression (dark blue) or no expression, but intermediate expression 
levels are rarely found. TSE was quantified as previously described” by determining 
the percentage of egg chambers with no expression in the germ line. 

Deep sequencing analyses. Small RNAs from hand-dissected ovaries were cloned 
using the DGE-Small RNA Sample Prep Kit and the Small RNA Sample Prep v.1.5 
Conversion Kit from Illumina (libraries 1 to 5), following the manufacturer’s 
instructions, or using the TruSeq (TM) SBS v.5 Kit at Fasteris (http://www. fasteris. 
com/) (libraries 6 to 8). Libraries 1 to 5 were sequenced using an Illumina 
Genome Analyzer II and libraries 6 to 8 were sequenced using an Illumina Hi- 
Seq 2000. Sequence reads in fastq format were trimmed from the adaptor sequence 
5'-TCGTATGCCGTCTTCTGCITG-3’ (libraries 1 to 5) or 5'-CITGTAGG 
CACCATCAAT-3’ (libraries 6 to 8) and matched to the D. melanogaster genome 
release 5.43 using Bowtie’, as well as to the sequences of the P-element constructs 
P{lArB} (FlyBase accession FBtp0000160) and Pf{lacW} (FlyBase accession 
FBtp0000204). Only 19-30-nucleotide reads matching the reference sequences 
with 0 or 1 mismatch were retained for subsequent analysis. For global annota- 
tion of the libraries (Supplementary Table 2), we used release 5.43 of fasta 
reference files available in FlyBase, including transposon sequences (dmel-all- 
transposon_r5.43.fasta) and release 18 of miRNA sequences from miRBase 
(http://www.mirbase.org). 

Sequence length distributions, small RNA mapping and frequency maps were 
generated using in-house Python scripts and R (http://www.r-project.org/) to 


analyse Bowtie outputs. Scripts were integrated and run in a Galaxy instance 
hosted by the laboratory. The corresponding Mississippi suite of analysis work- 
flows and codes is accessible from http://www.drosophile.org upon request. For 
library comparisons, read counts were normalized (Supplementary Table 2) to the 
total number of small RNAs that matched the D. melanogaster genome and did not 
correspond to abundant cellular RNAs (rRNA, tRNA and snoRNAs). For small 
RNA mapping, we matched each individual RNA sequence to P{lArB} or P{lacW} 
and gave to each matched position a weight corresponding to the normalized 
occurrence of the sequence in the small RNA library. When RNA sequences 
matched P{lArB} or P{lacW} repeatedly, the weight was divided by the number 
of hits to these P-element constructs. 

Distributions of piRNA overlaps (ping-pong signatures) were computed by 

collecting, for each sequencing data set, all the 23-28-nucleotide RNA reads 
matching PflArB} or P{lacW} whose 5' ends overlapped with another 23-28- 
nucleotide RNA read on the opposite strand. Then, for each possible overlap of 
1-28 nucleotides, the number of read pairs was counted. Distributions of siRNA 
overlaps were computed using a similar procedure, except that 20-22-nucleotide 
RNA reads were collected instead of the 23-28-nucleotide RNA reads. The dis- 
tributions of piRNA/siRNAs overlaps were computed by collecting separately the 
20-22-nucleotide and 23-28-nucleotide RNA reads matching P{lArB} or P{lacW}, 
and counting for each possible overlap of 1-22 nucleotdies the number of 
read pairs across these two distinct read data sets. To plot the overlap signatures, 
a z-score was calculated by computing, for each overlap of 1 to i nucleotides, 
the number O(i) of read pairs and converting it using the formula 
z(i) = (O(i) — mean(O))/standard deviation (O). 
RT-qPCR experiments. Total RNA was extracted (Qiagen kit) from ovaries 
dissected from 1A-6, BX2 and BX2* females and quantified (NanoDrop). Four 
to six biological replicates were made for each genotype. For each sample, 10 pg of 
RNA was treated with DNase (Fermentas). 1 ug of DNase-treated RNA was used 
for reverse transcription (Fermentas) using either no primer (control RT) or two 
primers simultaneously (specific RT): one specific to the nanos transcript used as 
the sample RNA quantification reference (5’-GGATTCGCCCTCTCTAAACC- 
3’) and the second specific to a region of the P{lacW} transgene. Pf{lacW} RT 
primers were designed to be specific to the sense (s) or to the antisense (a) tran- 
scripts of five regions of the P{lacW} transgene: 5'P, 5'lacZ, 3'lacZ, 5'white and 
3'P. Sequences are: al (5'-ATTCAAACCCCACGGACAT-3’), a2 (5’-AGTA 
CGAAATGCGTCGTTTAGAGC-3’), a3 (5'-GGGGAAAACCTTATTTATCAG 
CCG-3'), a4 (5'-GCTGTTTGCCTCCTTCTCTG-3’), s1 (5’-GITTTTCCCAGT 
CACGACGTT-3’), s2 (5’-AATGCGCTCAGGTCAAATTC-3’), s3 (5'-TATGG 
AAACCGTCGATATTCAGCC-3’), s4 (5’-ATTTTTGTGGGTCGCAGTTC-3’), 
s5 (5'-TTAAGTGTATACTTCGGTAAGCTTCG-3’), s6 (5'-TTTGGGAGTT 
TTCACCAAGG-3’). One primer was both antisense and sense (as) because it is 
located in the inverted repeat of the P element. It is (5’-TGATGA 
AATAACATAAGGTGGTCCCGTCG-3’). RT primers are shown on the trans- 
gene map (Supplementary Fig. 9). qPCR was then performed on triplicates of each 
RT with a primer pair specific for the nanos gene in order to quantify the nanos 
transcripts. Simultaneously, qPCR was performed on triplicates of the same RT 
using different primer pairs corresponding to the former five regions of interest of 
P{lacW}. qPCR primer sequences are: 5’ P (5’-CTGCAAAGCTGTGACTGGAG-3' 
and 5'-TTTGGGAGTTTTCACCAAGG-3’), 5’ lacZ (5'-GAGAATCCGACGG 
GTTGTTA-3’ and 5'-AAATTCAGACGGCAAACGAC-3’), 3! lacZ (5'-ACT 
ATCCCGACCGCCTTACT-3’ and 5'-GTGGGCCATAATTCAATTCG-3’), 5’ 
white (5'-GTCAATGTCCGCCTTCAGTT-3’ and 5'-GGAGTTTTGGCACAGC 
ACTT-3’) and 3’ P (5'-CCACGGACATGCTAAGGGTTAA-3’ and 5'-GTCGG 
CAAGAGACATCCACT-3’). The same series of dilutions composed of a mix of 
different RT preparations was used to normalize the quantity of nanos transcripts 
in all RT preparations leading to standard quantity (Sq) values for nanos 
transcripts in specific RT (using nanos primer = Sq(nanos)) or in control RT 
(without primer = Sq(control nanos)) preparations. A series of dilutions of a 
plasmid containing the P{lacW} transgene was used to normalize the quantity of 
transcripts of the clusters leading to Sq values for cluster transcript (Sq(specific) 
and Sq(control specific). Variations between technical triplicates seem to be 
very low when compared to variations between biological replicates. The mean 
of the three technical replicates was then systematically used (Sq). The measure of 
the quantity of transcripts from a given region for one biological sample was 
then calculated using the formula: (Sq(specific) — Sq (control specific) / 
(Sq(nanos) — Sq (control nanos)). This allowed us to eliminate the background 
noise due to both sense and antisense transcripts (Sq(control transcript)) and to 
take into account variations in the quantity of RNA between biological samples 
(Sq(nanos)). 
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Burkitt’s lymphoma (BL) can often be cured by intensive 
chemotherapy, but the toxicity of such therapy precludes its use 
in the elderly and in patients with endemic BL in developing 
countries, necessitating new strategies’. The normal germinal 
centre B cell is the presumed cell of origin for both BL and diffuse 
large B-cell lymphoma (DLBCL), yet gene expression analysis 
suggests that these malignancies may use different oncogenic 
pathways’. BL is subdivided into a sporadic subtype that is diag- 
nosed in developed countries, the Epstein-Barr-virus-associated 
endemic subtype, and an HIV-associated subtype, but it is unclear 
whether these subtypes use similar or divergent oncogenic mechan- 
isms. Here we used high-throughput RNA sequencing and RNA 
interference screening to discover essential regulatory pathways in 
BL that cooperate with MYC, the defining oncogene of this cancer. 
In 70% of sporadic BL cases, mutations affecting the transcription 
factor TCF3 (E2A) or its negative regulator ID3 fostered TCF3 
dependency. TCF3 activated the pro-survival phosphatidylinositol- 
3-OH kinase pathway in BL, in part by augmenting tonic B-cell 
receptor signalling. In 38% of sporadic BL cases, oncogenic 
CCND3 mutations produced highly stable cyclin D3 isoforms that 
drive cell cycle progression. These findings suggest opportunities to 
improve therapy for patients with BL. 

We performed RNA resequencing (RNA-seq) on 28 sporadic BL 
patient biopsies and 13 BL cell lines and reanalysed published RNA- 
seq data from 52 germinal centre B-cell-like (GCB) DLBCL cases and 
28 activated B-cell-like (ABC) DLBCL cases*. Elimination of known 
single-nucleotide polymorphisms left a set of putative single-nucleo- 
tide variants, of which 95% (495 out of 518) were confirmed by Sanger 
sequencing (Supplementary Tables 1 and 2). 

Mutations in many genes were more frequent in BL than in DLBCL, 
including MYC, as well as many not previously implicated in this 
lymphoma subtype (Fig. 1, Supplementary Fig. la and Supplemen- 
tary Table 1). Conversely, recurrently mutated genes in DLBCL*” 
(EZH2, SGK1, BCL2, CD79B, MYD88) were rarely, if ever, mutated 
in BL. Several genes were mutated in BL and DLBCL (TP53, GNA13, 
MKI67, CCND3), although TP53 mutations were more common in BL 


(Fig. 1 and Supplementary Fig. 1b). This mutational survey indicates 
that BL is pathogenetically distinct from other germinal centre-derived 
lymphomas. 

Highly recurrent mutations in TCF3 and its negative regulator ID3 
indicated that TCF3 has a central role in BL pathogenesis, as it does in 
normal B-cell development by regulating the transcription of 
immunoglobulin and other B-cell-restricted genes through E-box 
motifs*’. [D3 and/or TCF3 mutations were present in sporadic BL, 
HIV-associated BL and endemic BL, in 70%, 67% and 40% of samples, 
respectively, but these mutations were rare in other lymphoid cancers 
(Fig. 2a and Supplementary Table 3). In sporadic BL, [D3 mutations 
(58%) were more common than TCF3 mutations (11%), and some 
tumours had mutations in both genes (13%). ID3 mutations were 
usually bi-allelic whereas TCF3 mutations were often mono-allelic 
(Supplementary Fig. 2a and Supplementary Table 3). A somatic origin 
was confirmed for 14 ID3 mutations and 4 TCF3 mutations 
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Figure 1 | Recurrently mutated genes in aggressive lymphomas determined 
by RNA-seq. Shown are genes that were recurrently mutated in BL based on 
RNA-seq analysis (=4 out of 41 samples), as well as representative genes 
known to be recurrently mutated in DLBCL. Asterisks indicate differentially 
mutated genes (P < 0.05; Supplementary Table 9). 
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Figure 2 | TCF3is essential for Burkitt lymphoma viability. a, TCF3 and ID3 
mutation frequencies in lymphoid cancers. eBL, endemic BL; hivBL, HIV- 
associated BL; sBL, sporadic BL; MCL, mantle cell lymphoma; MM, multiple 
myeloma. b, Location of BL mutants in the crystal structure of the dimeric 
TCF3 E47 B-HLH domain”®. c, Location of BL mutants in the crystal structure 
of the ID3 HLH domain (Protein Data Bank accession number 2LFH). 

d, Selective toxicity of a TCF3 shRNA for BL lines. Shown is the fraction of 
GEP*, shRNA-expressing cells relative to the GFP’, shRNA-negative fraction 
at the indicated times, normalized to the day 0 values. Data are representative of 
four experiments. e, Toxicity of wild-type (WT) but not mutant ID3 isoforms 
for the ID3-mutant Namalwa BL line. Shown is the fraction of GFP*, ID3- 
expressing cells relative to the GFP”, ID3-negative cells, normalized to the day 0 
values. Data are representative of four experiments. f, TCF3 mutants with 
reduced ability to bind ID3. WT or mutant TCF3 isoforms were coexpressed 
with WT ID3 in 293T cells. The indicated proteins were detected in total 
cellular extracts (input) or after anti- TCF3 immunoprecipitation (IP) (left). 


(Supplementary Table 3). All TCF3 mutations affected the basic helix- 
loop-helix (B-HLH) DNA-binding and dimerization domain of one 
TCF3 splice isoform (E47) but not the other (E12), indicating a 
non-redundant role for E47 in BL pathogenesis. In cases with TCF3 
mutations, E47 was more highly expressed than E12, indicating gain- 
of-function (Supplementary Fig. 2b). 

Most TCF3 mutations target four evolutionarily conserved residues 
in the B-HLH region (N551K, V557E/G, D561E/V/N, M572K; 
Supplementary Fig. 3a). The most common mutations affect V557 
and D561, which are adjacent in the crystal structure and face away 
from DNA, indicating a role in intermolecular interactions (Fig. 2b). 
The B-HLH domain may be distorted by mutations affecting M572 
and L597, which are neighbouring residues in the crystal structure. 
N551 is a DNA contact residue", indicating that N551K could alter 
TCF3 DNA binding. 

A variety of nonsense and frameshift mutations inactivate [D3 in BL 
tumours, suggesting a tumour-suppressor mechanism (Supplementary 
Fig. 3b). Many missense mutations target the conserved loop region of 
ID3, potentially changing the tertiary structure of the B-HLH domain 
and impairing its ability to inhibit TCF3 (ref. 11, Fig. 2c). Numerous [D3 
missense mutations affect the HLH domain away from the interface of 
the two helices, possibly altering TCF3 interaction. Other mutations 
disrupt an ID3 splice donor and force a cryptic splice donor to be used, 
thereby deleting residues V82-Q100 (Supplementary Figs 2c and 3b). 

An RNA interference screen revealed TCF3 to be an essential gene in 
BL lines (Supplementary Fig. 2d and Supplementary Table 4), support- 
ing the notion that the TCF3 and JD3 mutations in BL promote TCF3 
action. TCF3 knockdown caused a time-dependent toxicity in all BL 
lines, irrespective of ID3/TCF3 mutations, but had no effect on DLBCL 
lines (Fig. 2d and Supplementary Figs 2e and 4a, b). Wild-type TCF3 
rescued BL lines from the toxicity of RNA-interference-mediated 
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ID3 levels were quantified by densitometry and normalized to TCF3 E47 levels 
(right). g, BL-derived mutant ID3 proteins are less stable than WT ID3 and 
bind TCF3 less well. Mutant or WT ID3 isoforms were expressed in the ID3- 
deficient Namalwa BL line. The indicated proteins were detected in total 
cellular extracts (input) or after anti-TCF3 IP. h, TCF3(N551K) is an altered- 
specificity mutant. Shown on top are DNA base frequencies of the most 
enriched motifs in peaks bound more than fourfold more or less by 
TCF3(N551K) compared to WT TCF3. The mean number (+s.e.m.) of the 
indicated motifs per differentially bound peak is plotted below. i, A TCF3 gene 
expression signature expressed in BL and normal germinal centre B cells. Gene 
expression changes were profiled in ID3-mutant BL lines following TCF3 
knockdown or WT ID3 overexpression. Shown are genes that were 
downregulated by at least 0.33 log, in >70% of samples. Average expression of 
these genes in the indicated lymphoma subtypes based on published data” and 
in B-cell subpopulations based on RNA-seq is shown. 


depletion of TCF3, as did the TCF3 mutants, indicating that they are 
not loss-of-function (Supplementary Fig. 2f). Introduction of wild- 
type ID3 into BL lines with [D3 mutations was lethal, but BL-derived 
ID3 mutants had less or no toxicity, consistent with a tumour- 
suppressor mechanism (Fig. 2e and Supplementary Figs 2g and 4c). 

The common ID3 and TCF3 mutants diminished their inhibitory 
heterodimerization. TCF3(V557E) and TCF3(D561E) did not associate 
well with ID3 and failed to stabilize ID3 protein expression, unlike 
wild-type TCF3 (Fig. 2f and Supplementary Figs 2h and 4d). 
Likewise, the ID3 mutant proteins were expressed less well than 
wild-type ID3 and were less able to co-immunoprecipitate TCF3 
(Fig. 2g and Supplementary Fig. 4e). However, TCF3(N551K) behaved 
like wild-type TCF3 in these dimerization assays, indicating a distinct 
mechanism. 

We next used chromatin immunoprecipitation followed by sequen- 
cing (ChIP-seq) analysis to gauge the ability of the TCF3 mutants to 
interact with chromatin genome-wide. We engineered two BL lines to 
express biotinylated wild-type or mutant TCF3 isoforms (TCF3- 
Biotag; see Methods), allowing us to precipitate bound chromatin with 
streptavidin. For comparison, we used anti-TCF3 antibodies to precip- 
itate chromatin in unmanipulated BL cells (Supplementary Table 5). 
Both the endogenous and TCF3-Biotag ChIP-seq peaks were enriched 
for E-box motifs (CAG(G/C)TG) and overlapped extensively 
(Supplementary Fig. 5a). In 25-base-pair bins bound by wild-type 
TCF3-Biotag, the V557E, D561E and N551K isoforms had over- 
lapping ChIP-seq tags (>7) in 98%, 98% and 92% of instances, respect- 
ively, but the overlap was only 10% for control Biotag ChIP-seq data 
(P<10 °°). Hence, all TCE3 mutants bound a large number of 
genomic targets equivalently. 

Given the lower overlap between TCF3(N551K) and wild-type 
TCF3 chromatin binding, we identified genomic regions that had 
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fourfold greater (n = 212) or lesser (n = 139) association with wild- 
type TCF3 than TCF3(N551K) (P< 10 1°) (Supplementary Table 6 
and Supplementary Fig. 5b). In these binding regions, TCF3(V557E) 
and TCF3(D561E) behaved like wild-type TCF3. The peaks bound 
preferentially by wild-type TCF3 contained multiple copies of the 
motif 5’-NNCACCTG-3’ whereas the peaks bound preferentially by 
TCF3(N551K) were enriched for the sequence 5’-GGCAGCTG-3’ 
(Fig. 2h). Whereas both motifs match the E-box consensus, these 
results indicate that TCF3(N551K) is an altered-specificity mutant that 
has somewhat different genomic targets than wild-type TCF3. 

To gain insight into the biological processes controlled by TCF3 in 
BL, we profiled changes in gene expression following TCF3 knock- 
down and following wild-type ID3 expression in ID3-mutant BL lines. 
We identified 139 “T'CF3-upregulated’ genes that were decreased in 
expression by both manipulations and 166 “T'CF3-downregulated’ 
genes that were increased in expression (false discovery rate 
(FDR) = 0.017; Fig. 2i and Supplementary Figs 2i and 6a). TCF3 
ChIP-seq peaks were enriched among TCF3-upregulated genes 
(58%; P=1.81X10 *°) and among TCF3-downregulated genes 
(32%; P = 1.03 X 10 *) (Supplementary Fig. 6a). We will refer to such 
genes as “['CF3 direct targets’. 

Most TCF3-upregulated genes were more highly expressed in BL 
than in DLBCL, whereas TCF3-downregulated genes were generally 
expressed at lower levels in BL (P = 0.001; Fig. 2i and Supplementary 
Fig. 6a, b). BL tumours with [D3 and/or TCF3 mutations had higher 
expression of the TCF3-upregulated signature than tumours with 
wild-type ID3 and TCF3, and the opposite was true for the TCF3- 
downregulated signature (P = 0.0001; Supplementary Fig. 6c). Hence, 
the transcriptional influence of TCF3 on the BL phenotype seems to 
be accentuated by I[D3/TCF3 mutations. TCF3-upregulated genes 
were more highly expressed in germinal centre B cells than in resting 
or activated blood B cells, and the reverse was true for TCF3- 
downregulated genes (Fig. 2i and Supplementary Fig. 6a), indicating 
that BL ‘inherits’ the TCF3 gene expression program from its normal 
cellular counterpart. 


Biological insights from this analysis include the fact that the 
negative regulators of TCF3—ID1, ID2 and ID3—were direct targets 
of TCF3 transactivation, thereby creating a negative feedback loop 
(Fig. 2i and Supplementary Fig. 5a). By RNA-seq, ID3 was 38-fold 
and 12-fold more highly expressed in BL than ID1 and ID2, respect- 
ively, accounting for the preferential mutation of ID3 in BL. TCF3 also 
positively regulated genes that have crucial roles in germinal centre 
B-cell biology (POU2AF1, CXCR4, LTB, CCND3). TCF3 upregulated 
CCND3 and E2F2 while downregulating RB1, thereby promoting cell 
cycle progression. 

Two components of the B-cell receptor (BCR), the immunoglobulin 
heavy and light chains, were both upregulated by TCF3 in BL, as in 
normal B cells*'? (Fig. 2i and Supplementary Figs 2i, j and 5a). In this 
regard, it was notable that knockdown of the BCR subunit CD79A was 
toxic for several BL lines in our RNA interference screen 
(Supplementary Fig. 7a). Two-thirds of BL lines were clearly BCR- 
dependent, on the basis of a time-dependent decrease in their viability 
following knockdown of either CD79A or the BCR-associated kinase 
SYK (Fig. 3a). Unlike ABC DLBCL lines, which have a ‘chronic active’ 
form of BCR signalling’; BL lines do not require the NF-kB pathway 
for survival because they were not killed by an IkB kinase B inhibitor 
and had little or no dependence on CARD11 (ref. 13), an adaptor that 
engages NF-«KB (Supplementary Fig. 7a-c). Rather, CD79A or SYK 
depletion in BL lines decreased AKT phosphorylation, a marker of 
phosphatidylinositol-3-OH (PI(3)) kinase signalling (Fig. 3b and 
Supplementary Fig. 7d), indicating that the BCR-dependency in BL 
is akin to ‘tonic’ BCR signalling’*, a phenomenon that engages pro- 
survival PI(3) kinase signalling more than NF-KB”. 

TCF3 knockdown decreased phospho-AKT levels in all BCR- 
dependent lines tested, as did ID3 overexpression (Fig. 3b and 
Supplementary Fig. 7d, e), perhaps due to decreased cell-surface 
BCR expression following TCF3 depletion (Fig. 3c). In addition, 
a direct TCF3 target, PTPN6, encodes the phosphatase SHP-1, an 
inhibitor of BCR signalling (Supplementary Figs 2) and 5a). TCF3 
depletion increased SHP-1 mRNA and protein levels, indicating 
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Figure 3 | Tonic BCR signalling and PI(3) kinase activity in Burkitt’s 
lymphoma. a, CD79A and SYK shRNAs are toxic for a subset of BL lines. 
Shown is the fraction of GFP‘, shRNA-expressing cells relative to the GFP’, 
shRNA-negative fraction at the indicated times, normalized to the day 0 values. 
BCR-dependent BL lines are depicted using red colours. The BCR-dependent 
ABC DLBCL line TMD8 (ref. 4) is also shown. Data are representative of three 
experiments. b, Knockdown of CD79A, SYK or TCF3 reduces PI(3) kinase 
activity. Following induction of the indicated shRNAs for 2 days, shRNA- 
expressing (GFP") cells were analysed by fluorescence-activated cell sorting 
(FACS) for phospho-S473-AKT as a measure of PI(3) kinase activity. c, TCF3 
regulates surface BCR expression in BL. Following induction of the indicated 
shRNAs for 1 day, surface BCR expression (CD79B) was quantified by FACS in 
shRNA-expressing (GFP) cells. d, TCF3 suppresses PTPN6 (SHP-1) 
expression. A TCF3 shRNA was induced in BL lines for 2 days, followed by 
immunoblotting for the indicated proteins. e, SHP-1 suppresses phospho- 
S473-AKT in BL lines. BL lines were transduced with a SHP-1 expression vector 
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(+) or empty vector (-), whereupon the indicated proteins were analysed by 
immunoblotting. f, BL lines have constitutively PI(3) kinase activity. The 
indicated proteins were analysed by immunoblotting, before and after 
treatment with the PI(3) kinase inhibitor LY294002. g, PI(3) kinase inhibition is 
toxic to BL lines. Viable BL cells were quantified by MTS assay following 
treatment for 4 days with the indicated concentrations of the pan-class I P1(3) 
kinase inhibitor BKM120. h, A signature of rapamycin-responsive genes is 
highly expressed in BL. Changes of gene expression were profiled over time in 2 
BL lines following rapamycin (100 pM) treatment. Genes consistently 
downregulated in both lines were chosen (see Methods), and their expression in 
lymphoma biopsies” is shown based on the colour scale. PMBL, primary 
mediastinal B-cell lymphoma. i, The rapamycin-upregulated and 
-downregulated signatures distinguish BL and GCB DLBCL. Genes are ranked 
according to their expression in BL versus GCB DLBCL (T-statistic) and 
rapamycin signature genes are indicated with a green hash mark. Kolmogorov— 
Smirnov P values are shown. 
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TCF3 repression (Fig. 3d and Supplementary Figs 2i and 6a). Ectopic 
provision of SHP-1 decreased phospho-AKT in BCR-dependent BL 
lines, indicating that TCF3 repression of SHP-1 may contribute to 
tonic BCR signalling and P1(3) kinase activation in BL (Fig. 3e). 

A screen of a larger number of BL lines revealed that all had PI(3) 
kinase-dependent AKT phosphorylation and engagement of the 
mTOR pathway, as judged by phosphorylation of p70 S6 kinase 
(Fig. 3f). Treatment of BL lines with BKM120, a PI(3) kinase inhibitor 
in clinical trials, or rapamycin, an inhibitor of the mTORC1 complex, 
was toxic to most BL lines (Fig. 3g and Supplementary Fig. 7f). Of note, 
both BCR-dependent and -independent lines had constitutive PI(3) 
kinase signalling. Other mechanisms to activate PI(3) kinase in BL 
include PTEN mutations, which were infrequent (7%), and tenfold 
overexpression (compared to DLBCL) of the MYC-dependent gene 
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MIRI7HG, which encodes a microRNA that inhibits PTEN expres- 
sion’® (Supplementary Fig. 7g). To judge whether the PI(3) kinase 
pathway may be active in primary BL tumours, we identified genes 
that were significantly up- or downregulated following rapamycin 
treatment of BL lines (FDR= 0.0022; Supplementary Table 7). 
Among BL and GCB DLBCL biopsies, rapamycin-downregulated 
genes were generally more highly expressed in BL (P = 0.026), whereas 
rapamycin-upregulated genes had the opposite enrichment pattern 
(P=0.007) (Fig. 3i), indicating that PI(3) kinase-dependent 
mTORC1 activity is a consistent feature of BL tumours. 

Another aspect of BL pathogenesis was revealed by recurrent muta- 
tions in the TCF3 direct target CCND3, encoding cyclin D3, a required 
regulator of the G1-S cell cycle transition in germinal centre B cells'”"*. 
CCND3 mutations were frequent in sporadic BL (38%) and HIV- 
associated BL (67%) but not endemic BL (1.8%), indicating a distinct 
genetic pathogenesis for this latter BL subtype (Fig. 4b). At a lower 
frequency, CCND3 mutations were also present in ABC and GCB 
DLBCL**’. Multiple nonsense and frameshift mutations removed 
up to 41 amino acids from the cyclin D3 C terminus (Fig. 4a and 
Supplementary Table 8). Recurrent missense mutations affected 
threonine 283 (T283), known to be involved in D-type cyclin 
phosphorylation and stability’’, as well as nearby proline (P284) and 
isoleucine (1290) residues. These cyclin D3 residues were conserved in 
evolution, and similar residues are present in cyclin D1 and D2 
(Fig. 4a). Most mutations were heterozygous and their somatic origin 
was confirmed in five cases (Supplementary Table 8). 

To explore the function of the cyclin D3 mutants, we constructed 
fusion proteins linking green fluorescent protein (GFP) to either wild- 
type or mutant cyclin D3. All mutant isoforms accumulated to more 
than tenfold higher levels than the wild-type isoform (Fig. 4c), and 
pulse-chase analysis showed that the mutant cyclin D3 isoforms have 
longer half lives (Supplementary Fig. 8a). To test the oncogenic poten- 
tial of the cyclin D3 mutants, we transduced GFP-tagged wild-type or 
T283A cyclin D3 into lymphoma lines in which endogenous cyclin D3 
was knocked down. Cells transduced with T283A cyclin D3 had a 
marked proliferative advantage over untransduced cells, but wild-type 
cyclin D3 had little effect (Fig. 4d). Separately, our RNA interference 
screen revealed that BL and GCB DLBCL lines depend on cyclin D3 
and CDK6, a kinase that partners with D-type cyclins, irrespective of 


Figure 4 | Oncogenic CCND3 mutations in Burkitt’s lymphoma. a, Cyclin 
D3 residues affected by the indicated mutations in each lymphoma subtype. 
Amino acids 250-292 of NP_001751 are shown. FL, follicular lymphoma; FS, 
frameshift. b, Frequencies of CCND3 mutations in different lymphoma 
subtypes. c, CCND3 mutations increase protein stability. FACS analysis of the 
Gumbus BL line transduced with WT or mutant GFP-CCND3 fusion proteins. 
d, The T283A cyclin D3 mutant confers a proliferation advantage. Expression 
of endogenous CCND3 was knocked down in Gumbus (BL) and BJAB (GCB 
DLBCL) cells and different GFP-CCND3 isoforms were ectopically expressed. 
The relative number of GFP-CCND3-expressing cells is plotted over time of 
shRNA and GFP-CCND3 induction, normalized to day 0. Data are 
representative of three experiments. e, CCND3 shRNAs are selectively toxic for 
BL and GCB DLBCL lines. Shown is the fraction of GEP*, shRNA-expressing 
cells relative to the GFP, shRNA-negative fraction at the indicated times, 
normalized to the day 0 values. Data are representative of four experiments. 
f, Cell cycle block in G1 phase is lethal to cyclin D3-mutant lymphoma lines. 
Lines were treated with the CDK4/6 inhibitor PD 0332991 (1 1M) over the 
indicated time course and analysed for viable cells in G1 phase, total viable cells 
and apoptotic cells. Data were normalized as indicated and are representative of 
three experiments. g, Therapeutic potential of PD 0332991 revealed using a BL 
xenograft model. Immunodeficient mice bearing established subcutaneous 
xenografts of the Gumbus BL line (engineered to express luciferase) were 
treated with PD 0332991 (150 mg per kg per day per os) for the indicated times. 
Tumour volumes were estimated by luciferase luminescence. Error bars are 
s.e.m. (n = 3). h, Schematic of recurrent oncogenic pathways in Burkitt’s 
lymphoma. Gain-of-function and loss-of-function aberrations are indicated by 
+ signs and by X signs, respectively. Grey boxes indicate drugs that block these 
deregulated pathways. 
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CCND3 mutational status (Fig. 4e, Supplementary Fig. 8b-d and 
Supplementary Table 4). Hence, BL lines rely on cyclin D3/CDK6 
for cell cycle progression, an effect augmented by oncogenic cyclin 
D3 mutations. The BL cell cycle is also deregulated by nonsense 
and frameshift mutations or homozygous deletions in CDKN2A, 
encoding the CDK6 inhibitor p16 (Supplementary Fig. 8e and 
Supplementary Table 8). 

To explore this pathway as a therapeutic target, we treated BL, 
GCB DLBCL and mantle cell lymphoma (MCL) lines with a CDK4/ 
6 inhibitor (PD 0332991) daily for 2 weeks. After an arrest in G1 phase, 
the BL and GCB DLBCL lines began to die by day 2, with a steady 
accumulation of apoptotic cells over time, whereas the MCL line 
arrested in Gl phase but did not die (Fig. 4f). Treatment of a BL 
xenograft model after the establishment of tumours with PD 
0332991 profoundly reduced tumour volume after 6 days, resulting 
in the virtual disappearance of tumour cells by day 10 (Fig. 4g and 
Supplementary Fig. 8f). 

By merging functional and structural genomic data we have 
uncovered previously unappreciated pathways in BL pathogenesis, 
several of which are amenable to therapeutic attack (Fig. 4h). The 
majority of BL tumours acquire mutations that free TCF3 from ID3 
inhibition. These mutations ‘hard-wire’ a TCF3 transcriptional pro- 
gram that is characteristic of germinal centre B cells and distinguishes 
BL from other aggressive lymphomas. BL lines require TCF3 for sur- 
vival, in part because it augments pro-survival PI(3) kinase signalling 
by intensifying a tonic form of BCR signalling. The oncogenic synergy 
between the MYC and PI(3) kinase pathways that is suggested by our 
study is supported by the generation of BL-like tumours in mice in 
which these two pathways are deregulated’’. Additionally, the key role 
of cyclin D3/CDK6 in BL pathogenesis is reinforced by the identifica- 
tion of cyclin D3 mutants in this mouse model. 

Whereas high-dose chemotherapy can often cure BL in younger 
patients from developed countries’, these regimens are unsafe in older 
patients and cannot be deployed in less developed regions due to 
immune suppression and to logistical difficulties that preclude effective 
delivery”’. Hopefully, the new insights into BL pathogenesis described 
herein will prompt clinical evaluation of drugs targeting the PI(3) 
kinase pathway, tonic BCR signalling, and cyclin D3/CDK6 in BL. 
Eventually, the rational combination of such targeted agents could 
provide more effective and less-toxic treatment of BL worldwide. 


METHODS SUMMARY 


RNA-Seq was performed using established Illumina protocols on a HiSeq 2000 
sequencer. RNA interference screening and cellular toxicity assays were conducted 
as described*'*. Gene expression profiling was performed using Agilent 4 x 44K 
microarrays. Detailed experimental and analytic procedures are presented in 
Supplementary Methods. 
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Structural basis for RNA-duplex recognition and 
unwinding by the DEAD-box helicase Mss116p 


Anna L. Mallam!, Mark Del Campo’t, Benjamin Gilman!, David J. Sidote’ & Alan M. Lambowitz! 


DEAD-box proteins are the largest family of nucleic acid helicases, 
and are crucial to RNA metabolism throughout all domains of 
life’. They contain a conserved ‘helicase core’ of two RecA-like 
domains (domains (D)1 and D2), which uses ATP to catalyse the 
unwinding of short RNA duplexes by non-processive, local strand 
separation’. This mode of action differs from that of translocating 
helicases and allows DEAD-box proteins to remodel large RNAs 
and RNA-protein complexes without globally disrupting RNA 
structure*. However, the structural basis for this distinctive mode 
of RNA unwinding remains unclear. Here, structural, biochemical 
and genetic analyses of the yeast DEAD-box protein Mss116p indi- 
cate that the helicase core domains have modular functions that 
enable a novel mechanism for RNA-duplex recognition and 
unwinding. By investigating D1 and D2 individually and together, 
we find that D1 acts as an ATP-binding domain and D2 functions 
as an RNA-duplex recognition domain. D2 contains a nucleic-acid- 
binding pocket that is formed by conserved DEAD-box protein 
sequence motifs and accommodates A-form but not B-form 
duplexes, providing a basis for RNA substrate specificity. Upon a 
conformational change in which the two core domains join to form 
a ‘closed state’ with an ATPase active site, conserved motifs in D1 
promote the unwinding of duplex substrates bound to D2 by 
excluding one RNA strand and bending the other. Our results 
provide a comprehensive structural model for how DEAD-box 
proteins recognize and unwind RNA duplexes. This model 
explains key features of DEAD-box protein function and affords 
a new perspective on how the evolutionarily related cores of other 
RNA and DNA helicases diverged to use different mechanisms. 

Mss116p is a DEAD-box RNA helicase that facilitates the folding 
and splicing of mitochondrial group I and group II introns primarily 
by acting as an RNA chaperone that unwinds RNA duplexes to disrupt 
stable but inactive RNA structures*’. The RecA-like helicase core 
domains of Mssl116p, which together catalyse RNA unwinding*°, 
contain conserved DEAD-box protein sequence motifs that are 
required for helicase function (Fig. la). D2 also includes a non- 
conserved carboxy-terminal extension (CTE) that stabilizes the 
domain and extends its RNA-binding surface'®. Small-angle X-ray 
scattering studies show that without substrates, the helicase core of 
Mss116p adopts an extended ‘open state’ conformation, as observed 
for other DEAD-box proteins'’”*. A compact ‘closed state’, the X-ray 
crystal structure of which has been determined for Mss116p and other 
DEAD-box proteins'*’’, is formed upon binding ATP and single- 
stranded RNA (ssRNA) and is thought to represent a ‘post-unwound’ 
state of the enzyme. 

The wide separation of D1 and D2 of Mss116p in the open state 
suggests that they might function independently to recognize ATP and 
RNA substrates. To investigate the roles of the individual helicase core 
domains, we compared the ATP and double-stranded RNA (dsRNA) 
binding properties of the full core (D1D2) and isolated D1 and D2 of 
Mss116p. Gel-filtration and ATP-agarose binding assays show that 
ATP binds to D1 and the full core with similar affinities, but does 


not bind appreciably to D2 (Fig. 1b and Supplementary Fig. 1). This 
result is consistent with previous studies that establish D1 of DEAD- 
box proteins as a conserved ATP-binding domain'*’. Conversely, 
fluorescence anisotropy and electrophoretic mobility shift assays 
(EMSA) show that a 14-base pair (bp) dsRNA binds to D2 and the 
full core with similar affinities, but does not bind appreciably to D1 
(Fig. 1c and Supplementary Fig. 2). The preferential binding of ATP by 
D1 and dsRNA by D2 with similar affinities to the full core supports 
the hypothesis that these helicase domains function independently in 
initial substrate capture in the open state of Mss116p. 

We next determined a crystal structure of Mss116p D2 in complex 
with the same 14-bp dsRNA used in the binding assays (Fig. 2a) at 
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Figure 1 | The distinct substrate-binding characteristics of the helicase core 
domains of Mss116p. a, Schematic of the domain architecture of the helicase 
core of Mss116p (D1, blue; D2, green; CTE of D2, orange), indicating conserved 
DEAD-box sequence motifs defined according to ref. 24. Full-length Mss116p 
contains additional unstructured N-terminal (residues 37-87) and C-terminal 
(residues 598-664) regions that are not required for helicase activity*”. 

b, Affinity of ATP for D1, D2, and the full helicase core (D1D2) measured by 
gel-filtration chromatography under equilibrium conditions. ATP binding was 
also assessed by an ATP-agarose binding assay (Supplementary Fig. 1). 

c, Affinity of FAM-dsRNA (Fig. 2a) for MBP-tagged D1, D2 and D1D2 
determined by fluorescence anisotropy measurements. Similar results for 
dsRNA binding were obtained by EMSA (Supplementary Fig. 2). Error bars in 
b and c represent the standard error for at least three independent 
measurements, and the error in the Kg represents the standard error of the 
nonlinear regression (see Methods). NB, no significant binding. 
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Figure 2 | Crystal structures of Mss116p D2 bound to A-form duplexes. 

a, The 14-bp self-complementary GC-rich RNA-duplex substrate. b, The 14-bp 
GC-rich chimaeric RNA-DNA-duplex substrate. c-e, Orthogonal views of the 
D2-dsRNA complex coloured as in a and Fig. 1a. Helix «14 of D2, which 


3.2 A resolution, the first structure of a DEAD-box helicase domain 
bound to a duplex substrate (Fig. 2c-e, Supplementary Fig. 3 and 
Supplementary Table 1). The crystallographic asymmetric unit con- 
tains four very similar complexes with protein molecules bound on 
either side of a pseudo-continuous RNA duplex (Supplementary 
Fig. 3a). The structure of a single complex shows that D2 contains a 
positively charged binding pocket for an RNA duplex of A-form 
geometry (Fig. 2c-e and Supplementary Fig. 3b). One duplex strand 
(strand 1) interacts extensively with D2 (Fig. 3a, b and Supplementary 
Table 2). These interactions include multiple contacts to the phosphate 
groups of the three centrally bound nucleotide residues (N4-N6) by 
DEAD-box motifs IV, IVa, V, and a loop containing motif Va. The 
second strand (strand 2) makes only a few contacts, which include 
hydrogen bonds between 2'-OH groups and the CTE (Fig. 3a, c and 
Supplementary Table 2). No protein contacts are observed to the RNA 
bases of either strand (Fig. 3), consistent with the non-specific RNA 
binding shown by Mss116p and other DEAD-box proteins’. Except for 
the contact by motif Va, the structure of D2 and its contacts to the 
phosphate backbone of strand 1 are the same as in the closed-state 
structure of Mss116p, in which D2 additionally interacts with D1 and 
adenosine nucleotide (Fig. 3a and Supplementary Fig. 4; root mean 
squared deviation = 0.46 A)!°. Given the similar binding affinities 
observed for dsRNA by isolated D2 and the full helicase core 
(Fig. 1c and Supplementary Fig. 2), we propose that the D2-dsRNA 
structure provides a model for the initial complex of Mss116p with 
duplex RNA in the open state of the enzyme and a structural basis for 
dsRNA recognition by Mss116p. 

Wealso determined a structure of D2 in complex with an equivalent 
14-bp chimaeric RNA-DNA duplex (Fig. 2b) at 3.6 A resolution, in 
which each molecule of protein interacts with two A-form duplex 
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contains motif IVa, faces the major groove of the dsRNA, and «18 and «20 of 
the CTE face the minor groove of the dsRNA. f-h, Orthogonal views of the D2- 
dsRNA-DNA complex, coloured as in b and Fig. 1a, in which D2 is bound to 
two stacked 14-bp chimaeric RNA-DNA duplexes. 


substrates (Fig. 2f-h, Supplementary Fig. 5 and Supplementary 
Table 1). These chimaeric duplexes make almost identical contacts 
with D2 as dsRNA (Supplementary Fig. 5b). Surprisingly, however, a 
DNA segment of strand 1 interacts with the DEAD-box motifs in the 
main ‘RNA’-binding tract of D2, while an RNA segment binds to the 
CTE (Fig. 2f-h and Supplementary Fig. 5). This orientation is probably 
favoured because D2 interacts primarily with nucleic acid substrate 
phosphate groups, whereas the non-conserved CTE makes hydrogen- 
bond contacts with the 2’-OHs of RNA (see earlier). The finding that 
A-form DNA and RNA interact in a similar manner in the conserved 
RNA-binding tract of D2 suggests that the substrate specificity of 
DEAD-box proteins for RNA duplexes is dictated primarily by phos- 
phate backbone geometry. Consistent with this idea, modelling demon- 
strates that the binding pocket of D2 is not shaped to recognize a 
B-form DNA duplex, the predominant conformation of dsDNA 
(Supplementary Fig. 6). Additionally, a genetic assay that is stringently 
dependent upon Mss116p function indicates that the side chains of 
conserved residues of D2 that interact with phosphate groups (R415, 
motif IVa; and T433, motif V) are critical for Mss116p function in vivo’, 
whereas the side chains of residues in the CTE that interact with 2’-OH 
groups of the RNA (S535, R538) can be mutated without detectable loss 
of function (Supplementary Fig. 7). Nucleic acid recognition based on 
duplex geometry may explain why DEAD-box proteins can unwind 
chimaeric RNA-DNA duplexes with as few as two centrally located 
ribonucleotides’, as structural studies indicate that chimaeric duplexes 
with only one ribonucleotide can adopt A-form geometry'®””. 

Our D2 structures provide insight into several other DEAD-box 
protein activities. DEAD-box proteins function on a wide variety of 
RNA substrates’”. A comparison of the D2-dsRNA and D2-dsRNA- 
DNA structures, and different complexes within their asymmetric 
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Figure 3 | Interactions between Mss116p D2 and duplex RNA. a, Schematic 
of RNA-protein interactions observed in the D2-dsRNA structure. The 
dsRNA interacts with conserved DEAD-box motifs [V-Va of D2 (green) and 
the CTE of D2 (orange). Boxes indicate that the interaction is maintained in 
closed-state Mss116p'°. RNA bases are numbered according to their position 


units, shows that the binding orientation of the distal regions of duplex 
substrates can vary, while the contacts between the conserved motifs 
and centrally bound nucleotide residues are maintained (Supplemen- 
tary Fig. 8). Although these differences in substrate binding orientation 
could be influenced by crystal packing, the observed flexibility in 
nucleic-acid binding away from the basic binding tract of D2 may be 
advantageous for the loading and unwinding of diverse physiological 
RNA substrates, such as group I and group II introns, and could con- 
tribute towards the general RNA-chaperone activity of Mss116p*°. The 
presence of a dsRNA-binding pocket in D2 also raises the possibility 
that this domain could have a role in strand annealing in the absence of 
ATP, an activity observed for Mss116p and other DEAD-box proteins’, 
by orienting two ssRNAs in a position to pair in the duplex-binding 
pocket. The additional RNA interactions with the CTE of Mss116p may 
explain the relatively high strand annealing activity of Mss116p com- 
pared to other DEAD-box proteins"*. 

Collectively, our results indicate that RNA unwinding by Mss116p 
begins with the helicase core domains functioning independently to 
bind ATP and RNA substrates (Fig. 4a). This previously unobserved 
mechanism for substrate recognition by a helicase is consistent with 
the wide separation of the two core domains (~50 A for their centres of 
mass) in the solution structure of the open state’’. Subsequent inter- 
actions of the exposed regions of these substrates with the remainder of 
their binding sites in the opposite domain would result in cooperative 
tight binding coupled to core closure, RNA-strand separation, and 
formation of the ATPase active site (Fig. 4a). Notably, the loop in 
D2 that contains part of motif Va and interacts with duplex RNA in 
the D2-dsRNA structure (see earlier) shifts markedly upon formation 


relative to ssRNA in closed-state Mss116p (see Supplementary Fig. 4d). Similar 
nucleic-acid—protein interactions were observed in the D2-dsRNA-DNA 
structure (Supplementary Fig. 5). H-bond, hydrogen bond. b, Interactions 
between strand 1 (yellow) of the duplex RNA and D2 (green). c, Interactions 
between duplex RNA and the CTE of D2 (orange). 


of the closed state and helps form the ATPase active site (Fig. 4b). This 
conformational change may be part of a switch that triggers ATP 
hydrolysis upon core closure. After ATP hydrolysis, dissociation of 
P; and ADP cause reopening of the core, release of the bound strand, 
and regeneration of the enzyme*””. 

Comparison of the D2-dsRNA structure to the previously reported 
closed-state structure of the helicase core of Mss116p bound to adenosine 
nucleotide and ssRNA (Fig. 4c) indicates that D2 functions as a 
stationary platform that positions dsRNA for unwinding by the 
incursion of D1 (Fig. 4d, e and Supplementary Fig. 9). In the closed 
state of Mss116p, RNA-binding motifs Ia, Ib, Ic, and the post-II region 
of D1 are sterically incompatible with dsRNA bound in the duplex- 
binding site of D2 (Fig. 4d). This suggests that D1 promotes RNA 
unwinding in two ways (Fig. 4e and Supplementary Fig. 9). First, the 
conserved post-II region of D1 interrupts the centrally bound base 
pairs of the RNA duplex to displace strand 2. This displacement is 
presumably facilitated by the minimal interaction of strand 2 with 
the protein and could occur actively or during ‘breathing’ of the 
duplex”®. Second, core closure introduces two bends in strand 1, one 
by interactions with the conserved ‘wedge helix’ in D1 (motif Ic) and 
the other by interactions with the CTE’"®. DEAD-box proteins that 
unwind RNA but lack the CTE introduce only the first bend using the 
same conserved D1 wedge helix'’. On core closure, the buried solvent 
accessible surface area of strand 1 increases owing to additional inter- 
actions with D1 (1,256 A? compared to 544 A? in the D2-dsRNA 
structure; Supplementary Table 2), contributing to the driving force 
for RNA unwinding. Further, the two bends induced in strand 1 
(Supplementary Fig. 9b) impede its re-annealing to strand 2. 
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Figure 4 | RNA-duplex binding and unwinding by Mss116p. a, Model for 
the modular roles of the helicase core domains of Mss116p during RNA 
recognition and unwinding. Although ATP and duplex RNA could bind in 
either order, the binding of ATP to D1 is shown as a first step because 
Mss116p-ATP complexes are probably pre-populated at physiological 
concentrations of ATP (>1mM)°*. b, Comparison of the position and 
interactions of the flexible motif Va loop (residues 435-440) in the D2-dsRNA 
and closed-state structures of Mss116p. The closed-state helicase core of 
Mss116p (PDB accession 315X)"° bound to ssRNA (Ujo-RNA; yellow) and 
adenosine nucleotide (AMP-PNP; black) is shown with domains coloured as in 


Additional conformational changes that occur upon ATP hydrolysis, 
dissociation of P;, and/or reopening of the core may also contribute to 
RNA unwinding. 

The model we propose here for RNA-duplex recognition and 
unwinding by Mss116p explains the previously reported requirement 
for ATP binding, but not hydrolysis, for RNA unwinding by DEAD- 
box proteins*'””. This is because substrate binding drives core closure 
and strand separation (Fig. 4a), whereas ATP hydrolysis to regenerate 
the enzyme cannot occur until formation of the ATPase active site in 
the closed-state core. The model also elucidates why RNA unwinding 
by DEAD-box proteins is non-processive and can initiate directly from 
a double-stranded region of a substrate'”, as RNA duplexes are bound 
directly by D2. Additionally, the differences in RNA-unwinding activity 
observed for different ATP analogues” could reflect differences in their 
binding affinity for the closed state. 

Because the RNA-unwinding mechanism described here for 
Mss1l6p depends primarily upon conserved DEAD-box protein 
structures and motifs with only an ancillary role for the CTE, we 
propose that its major features are used by all DEAD-box proteins. 
All DEAD-box proteins rely on their helicase core for RNA unwinding 
and use appended domains for auxiliary functions, such as interactions 
with partner proteins or to target the helicase core to specific RNA 
substrates’. Structural studies of other DEAD-box proteins show that 
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Fig. la. In the D2-dsRNA structure, the motif Va loop (green) interacts with 
strand 1 of dsRNA (yellow), whereas in the closed-state structure, the loop (red) 
shifts to a different position where motif Va helps to form the ATP-binding site. 
c, Surface representation of closed-state Mss116p with N1-N10 of Ujo-RNA 
(yellow) indicated. d, Surface representation of closed-state Mss116p with 
dsRNA modelled in the duplex-RNA-binding pocket of D2. Sterically 
incompatible regions of D1 are highlighted in red. e, Change in trajectory of 
strand 1 and predicted displacement of strand 2 of dsRNA by D1 upon core 
closure of Mss116p with arrows indicating regions of the substrate that are 
displaced. 


D1 can by itself bind adenosine nucleotide in a binding pocket formed 
by the Q-motif, which recognizes the adenine base, motif I (the 
phosphate-binding or P-loop) and motif II'*”’. By contrast, structures 
of Mss116p and other DEAD-box proteins in the closed state show 
that D2 interacts minimally with the adenine base’'®"’, in agreement 
with our observation that D2 does not by itself bind specifically to ATP 
(Fig. 1b and Supplementary Fig. 1). Likewise, all DEAD-box proteins 
contain a conserved RNA-binding track in D2 that contains motifs IV, 
IVa and V', and could recognize dsRNA similarly to Mss116p. By 
contrast, dsRNA is sterically incompatible with the motif Ic wedge 
helix and the post-II region in the RNA-binding track of D1 of other 
DEAD-box proteins’®*”’, in agreement with our finding that D1 of 
Mss116p cannot by itself bind an RNA duplex (Fig. 1c and Sup- 
plementary Fig. 2). The post-II region, which displaces strand 2 of a 
bound duplex in the unwinding mechanism proposed for Mss116p 
(Fig. 4d, e), is conserved and positioned to have the same role in other 
DEAD-box proteins®”’, as is motif Va!>**, which forms part of loop 
that contributes to initial binding of the duplex RNA and rearranges to 
help form the ATPase active site in the closed state (Fig. 4b). 
Non-ring-forming helicases with structurally conserved cores of 
RecA-like domains D1 and D2 are classified into two superfamilies 
(SFs), SF1 and SF2, with DEAD-box proteins comprising the largest 
family of SF2 (refs 24, 25). These helicases are thought to have evolved 
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from a common ancestor, but have diverged to possess a variety of 
accessory domains, to have different specificities for RNA or DNA 
substrates, and to operate by distinct mechanisms*”*. These include 
processive and non-processive duplex unwinding and translocation 
without unwinding. The modular substrate-binding functions of D1 
and D2, duplex binding by D2, and substrate specificity based on 
nucleic acid geometry found here for Mss116p may be features that 
underlie these diverse mechanisms. Interestingly, two recent crystal 
structures of the closed state of the pathogen recognition receptor RIG- 
I (also known as DDX58), an SF2 helicase closely related to DEAD-box 
proteins, show dsRNA bound in the RNA-binding track of D2 of RIG- 
I in the same orientation as D2 of Mss116p and interacting with con- 
served SF2 helicase RNA-binding motifs including IV, IVa and V 
(Supplementary Fig. 10a, c)””*. However, the orientation of D1 in 
the RIG-I complex differs from that of Mss116p in the closed state 
(Supplementary Fig. 10b), enabling RIG-I to bind and translocate on a 
duplex substrate without RNA unwinding”. RIG-I also contains 
ancillary domains that contribute to dsRNA binding and may influence 
substrate orientation to favour duplex binding over unwinding”. 
Other SF1 and SF2 helicases could have evolved similarly to promote 
different closed-state conformations of their helicase cores that give rise 
to distinct mechanisms of action. 


METHODS SUMMARY 


The helicase core of Mss116p (D1D2; residues 88-597), D1 (residues 88-330) and 
D2 (residues 342-597) were expressed and purified as described for full-length 
Mss116p'®"’. Substrate-binding assays were performed in a buffer of 20 mM Tris- 
HCl (pH7.5), 100 mM KCl, 10% glycerol, 1 mM dithiothreitol, 5 mM MgCl,. ATP 
binding was measured by incubating proteins with ATP at 22 °C followed by gel- 
filtration chromatography at 4°C with increasing concentrations of ATP to 
measure the Ar60nm/A2s0nm Of the eluted protein. For ATP-agarose assays, 
proteins were incubated with ATP-agarose at 4°C for 24h, and binding was 
assessed by SDS-PAGE of the ATP-agarose pellet. Fluorescence anisotropy and 
EMSA measurements were performed using a 3’ fluorescein (FAM)-labelled 14-bp 
self-complementary dsRNA (5’-GGGCGGGCCCGCCC-FAM-3’) annealed by 
slow cooling after heating to 94 °C for 1 min. Crystals of D2-dsRNA that contain 
an unlabelled version of the dsRNA substrate used in the binding assays were 
obtained in 8% tacsimate (pH 6.0; Hampton Research), 20% (w/v) PEG 3350 
by hanging drop. Crystals diffracted to 3.2 A and belong to space group P2,2,2 
with a unit cell of a= 160.5 A, b=88.4A and c= 121.2A. The final solution 
has an Ryo and Reece of 22.4% and 26.8%, respectively. Crystals of D2- 
dsRNA-DNA that contain a 14-bp self-complementary chimaeric substrate 
(5'-rGrGrGrCrGrGrGdCdCdCdGdCdCdC) were obtained in 6% tacsimate 
(pH 5.0; Hampton Research), 20% (w/v) PEG 3350 by hanging drop. Crystals 
diffracted to 3.6A and belong to space group P2,2,2, with a unit cell of 
a= 43.7A, b=70.1A and c=214.9A. The final solution has an Ryo and Ree 
of 24.4% and 28.0%, respectively. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Oligonucleotides. The RNA and RNA-DNA oligonucleotides rGrGrGrCrGr 
GrGrCrCrCrGrCrCrC and rGrGrGrCrGrGrGdCdCdCdGdCdCdC (Integrated 
DNA Technologies) were annealed to form 14-bp RNA or chimaeric RNA- 
DNA duplexes by heating 6mM solutions in 100mM potassium acetate, 
30mM HEPES (pH 7.5) at 94°C for 1 min and then slowly cooling to room 
temperature over 1h. 

Protein expression and purification. pMAL-Mss116p contains the coding 
sequence for Mss116p (codons 37-664) with an in-frame N-terminal MalE fusion 
cloned downstream of a tac promoter in the expression vector pMAL-c2t (a 
derivative of pMAL-c2x; New England Biolabs)'*. Mss116p/D1D2 is a derivative 
of pMAL-Mss116p that expresses the active helicase core of Mss116p (residues 
88-597) with deletions of an unstructured N-terminal extension and C-terminal 
tail''. Expression vectors for the Mss116p constructs D1 (residues 88-330) and D2 
(residues 342-597) were created by PCR of pMAL-Mss116p with primers that 
introduce BamHI and HindIII sites at the 5’ and 3’ end of the desired gene 
segment, and then cloning the PCR product between the corresponding sites of 
pMAL-c2t to link the protein-coding sequence to that of the MalE tag. 

Mss116p D1D2, D1 and D2 were expressed as N-terminal MalE fusions in 
Escherichia coli Rosetta 2 (EMD Biosciences), grown in ZYP-5052 auto-inducing 
medium for 24h at 22°C, and purified at 4°C, as described'*"’. Purification steps 
included: (1) removal of nucleic acids by polyethyleneimine precipitation; (2) isola- 
tion of MBP-Mss116p by amylose-affinity chromatography (New England Biolabs); 
(3) removal of the MBP tag by digestion with tobacco etch virus protease; (4) 
isolation of Mss116p by heparin-Sepharose chromatography (GE Healthcare); 
and (5) purification and buffer exchange by gel-filtration chromatography using 
a Superdex $200 column (GE Healthcare). MBP-tagged proteins were purified in an 
identical manner but without step (3). Proteins were concentrated for crystallization 
by using a 10-kDa MWCO concentrator (Millipore), and protein concentrations 
were determined by Bradford Assay (Bio-Rad). Crystallization and storage buffers 
were 20 mM Tris-HCl (pH 7.5), 200 mM KCI, 10% glycerol, 1 mM dithiothreitol. 
Crystallization. For the D2-dsRNA complex, the protein (~500 UM) was incu- 
bated with dsRNA (650 UM duplex) and MgCl, (2 mM) for 30 min on the desktop. 
Hanging drops were assembled using 1 1] of complex and 1 kl of a well solution of 
8% tacsimate (pH 6.0; Hampton Research), 20% PEG 3350. Drops were stored at 
22 °C and plate-like crystals appeared within 1 week. Crystals were stabilized in a 
cryoprotectant containing the crystallization solution plus 20% glycerol before 
flash cooling in liquid N>. Crystals of D2-dsRNA-DNA were obtained similarly 
with hanging drops assembled from 1 jl of dSRNA-DNA complex and 1 il of a 
well solution of 6% tacsimate (pH 5.0; Hampton Research), 20% PEG 3350 and 
were cryoprotected as above. 

Structure determination. X-ray diffraction data were collected either on our 
in-house system (Rigaku MicroMax-007 HF generator with VariMax HF optics 
and an R-AXIS IV++ imaging plate detector; wavelength 1.54178 A) or at the 
Advanced Light Source (ALS), Lawrence Berkeley National Laboratory (mail-in 
service on beamlines 5.0.2, or 5.0.3; wavelength 1.00003 A). Details of data collec- 
tion and refinement are in Supplementary Table 1. Diffraction intensities were 
indexed, integrated, and scaled with HKL-2000°*' or autoPROC™. For diffraction 
data processed with HKL-2000, additional statistics were calculated from 
unmerged data using d* TREK”. Initial space groups were determined by using 
Pointless** and confirmed by decreases in both Ryo. and Reece after refinement of 
molecular replacement solutions. Molecular replacement was performed with 
Phaser”, using the previously determined structure of Mss116p D2 in the closed 
state (PDB accession 315X) as a search model. Composite omit maps were calcu- 
lated to determine that there was no model bias**. Structures were completed with 
cycles of manual model building in Coot” and refinement in Phenix”®. Validation 
of protein and nucleic acid models and their contacts was done by using 
MolProbity** and indicated that at least 96% of residues are located in the most 
favourable region of the Ramachandran plot. Structural figures were prepared by 
using the PyMOL Molecular Graphics System, v. 1.4. SASA calculations and 
interface analyses were performed using PDBePISA”. 

ATP-binding assays. Equilibrium binding of ATP to D1, D2 and D1D2 was 
measured by gel filtration and ATP-agarose binding assays. Because the helicase 
core of Mss116p does not contain tryptophan residues, its calculated extinction 
coefficient is small (eg9 = 18,255M ‘cm }; ExPASy Proteomics Server 
ProtParam tool*’). The binding of ATP (269 ~ 15,400 M ‘cm !) therefore gives 
rise to a large change in A260 nm Compared to Azgonm- Protein samples (10 1M) 
were incubated at 22°C for 30 min in increasing concentrations of ATP-Mg”~ 
(0-200 1M) and loaded onto a HiTrap desalting column (GE Healthcare) 
pre-equilibrated in a buffer containing the same amount of ATP-Mg** and 


20mM Tris-HCl (pH 7.5), 100mM KCl, 10% glycerol, 1mM dithiothreitol, 
5mM MgCh. The absorbance of the eluted protein above the background signal 
of the buffer was measured at 260 and 280 nm, and the change in A260 nm/A2gonm 
signal at increasing concentrations of ATP was fit to a simple one-site ligand- 
binding model to calculate a Ky for the protein-ATP complex. 

ATP-agarose binding assays were performed in 20mM Tris-HCl (pH 7.5), 

100mM KCI, 10% glycerol, 1mM dithiothreitol, 5mM MgCl, by incubating 
AP-ATP-agarose (150 ul; Jena Bioscience) with ~50 1g of protein overnight at 
4 °C with agitation. The beads were pelleted by centrifugation (1,000g, 1 min), the 
supernatant was removed, and the pellet was resuspended in binding buffer 
(150 ul). The load, supernatant, and ATP-agarose pellet were analysed by SDS- 
PAGE, and the gels were stained with SYPRO Ruby protein gel stain (Invitrogen) 
to detect and quantify bound and unbound Mss1116p. 
RNA-binding assays. Equilibrium binding of dsRNA to D1, D2 and D1D2 was 
measured by fluorescence anisotropy and EMSA assays. Fluorescence anisotropy 
measurements were performed using MBP-tagged proteins to increase the change 
in anisotropy signal upon binding. A 3’ FAM-labelled 14-bp RNA duplex (10 nM; 
IDT; Fig. 2a) was incubated with increasing concentrations of protein (0-4 1M) at 
22 °C for at least 1 h in a reaction medium containing 20 mM Tris-HCl (pH 7.5), 
100 mM KCl, 10% glycerol, 1 mM dithiothreitol, 5 mM MgCl, and 0.1 mg ml! of 
bovine serum albumin to stabilize the protein at low concentrations. The observed 
fluorescence anisotropy, ops, of FAM-dsRNA at increasing concentrations of 
protein was measured by using an EnVision Microplate Reader (Perkin Elmer) 
and was fit to the equation: 


Tobs = Tofo + 71 — fo) 


where fy, is the fraction of protein-bound dsRNA and rrand 7, are the anisotropy of 
the free and protein-bound dsRNA, respectively*’. f, was defined as fi, = [RP]/[P,], 
where [RP] and [P,] are the concentration of protein-RNA complex and the total 
protein concentration, respectively. The dissociation constant, Ky, is defined by the 
quadratic equation”’: 


(Kat (Pi) + (RD) — y/(Ka + (Pi) + (Re)? —40PIIR 
2 


where [R,] is the total concentration of dsRNA. 

EMSA measurements were performed on both MBP-tagged proteins 

(to increase protein solubility under the experimental conditions) and untagged 
proteins. A 3’ FAM-labelled 14-bp RNA duplex (100 nM; IDT; Fig. 2a) was incu- 
bated with increasing concentrations of protein (0-3 1M) at 22°C as described 
above. Samples were then analysed in a non-denaturing 6% polyacrylamide gel 
run at 4°C for 60 min, and the fluorescence signal of the bound duplex substrate 
was quantified on a Typhoon imager (GE Healthcare). Control gels verified that the 
fluorescence signal of the unbound substrate increased linearly with concentration 
(Supplementary Fig. 2e). 
Genetic assays. Yeast genetic selections of functional Mss116p variants from 
centromere-containing (CEN) plasmid libraries in which specified codons were 
randomized and glycerol growth tests of individual variants were performed as 
described”. 
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Alternating-access mechanism in conformationally 
asymmetric trimers of the betaine transporter BetP 


Camilo Perez'*, Caroline Koshy'*, Ozkan Yildiz' & Christine Ziegler! 


Betaine and Na* symport has been extensively studied in the 
osmotically regulated transporter BetP from Corynebacterium 
glutamicum, a member of the betaine/choline/carnitine transporter 
family, which shares the conserved LeuT-like fold of two inverted 
structural repeats’. BetP adjusts its transport activity by sensing the 
cytoplasmic K* concentration as a measure for hyperosmotic stress 
via the osmosensing carboxy-terminal domain~”. BetP needs to be in 
a trimeric state for communication between individual protomers 
through several intratrimeric interaction sites*. Recently, crystal 
structures of inward-facing BetP trimers have contributed to our 
understanding of activity regulation on a molecular level**. Here 
we report new crystal structures, which reveal two conformationally 
asymmetric BetP trimers’, capturing among them three distinct 
transport states. We observe a total of four new conformations at 
once: an outward-open apo and an outward-occluded apo state, 
and two closed transition states—one in complex with betaine 
and one substrate-free. On the basis of these new structures, we 
identified local and global conformational changes in BetP that 
underlie the molecular transport mechanism, which partially 
resemble structural changes observed in other sodium-coupled 
LeuT-like fold transporters, but show differences we attribute to 
the osmolytic nature of betaine, the exclusive substrate specificity 
and the regulatory properties of BetP. 

Crystals of a surface-engineered, fully functional BetP mutant 
(BetP(AN29/E44E45E46/AAA))° were grown in the presence of 
betaine, diffracting to a resolution of 3.1 A. Exchange of Gly 153 
against aspartate in the unfolded stretch of transmembrane helix 1 
(TMH)l1' in the same mutant (Supplementary Fig. 1) resulted in a 
sixfold increased affinity for sodium and additional specificity for 
choline® (Supplementary Fig. 2). Note that the numbering of the 
BetP TMHs was adapted to the LeuT numbering for better compar- 
ison®. Therefore, TMH1’-TMH10’ correspond to TMH3-TMH12, 
whereas TMH1 and TMH2 are now designated as TMH(—2) and 
TMH(-—1), respectively. BetP(G153D) was co-crystallized with 
choline and crystals diffracted to a resolution of 3.25 A. The structures 
revealed BetP trimers that lack a non-crystallographic three-fold 
symmetry (Fig. 1, Supplementary Fig. 3a and Supplementary 
Table 1). That is, each protomer within one trimer adopts a different 
conformation of the alternating-access cycle (Supplementary Fig. 3b): 
a substrate-free outward-occluded (C,,,), a substrate-free outward- 
open (C,), a closed substrate-free (C.), a closed substrate-bound 
(C.S), an inward-open (C;S) betaine-bound and an inward-open 
choline-bound (Fig. 1). The closed states reported here for the first 
time—to our knowledge—for a LeuT-like fold transporter are an inter- 
mediate between outward- and inward-facing states (Supplemen- 
tary Fig. 3b). C.S is characterized by a central binding site (S1 site) 
that is closed by nearly 14A of protein bulk from either side of the 
membrane. The trimethylammonium group of betaine in the S1 site 
forms cation-7 interactions with Trp 373, Trp 374 and Trp 377, which 
are arranged in a tryptophan prism (Fig. 1, inset). These residues are 


located entirely in TMH6’ and comprise the signature motif of the 
betaine/choline/carnitine transporter (BCCT) family’. We also iden- 
tified a positive peak in the F, — F. difference electron density map in 
the C,S state localized at the structurally conserved Na2 site (Khafizov 
et al., manuscript in preparation). Sodium is coordinated in the Na2 
site by carbonyls from the backbone of residues Ala 147 and Met 150 
from TMH1' and Phe 464 as well as by the hydroxyl groups of residues 
Thr 467 and Ser 468 in TMH8’ (Fig. 1, inset). The betaine location 
observed in the S1 site is different from that in C;S (Fig. 1, inset) or the 
published inward-occluded state (C;,.S)° (Supplementary Fig. 4). In 
the C;,.S state, Trp 374 in TMH6’ and Trp 194 and Tyr 197 in TMH2’ 
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Figure 1 | Conformational states and substrate-binding sites observed in 
BetP and BetP(G153D) asymmetric trimers. BetP(G153D) (PDB accession 
4DO)J): chain A, C,,3 chain B, C,; chain C; C;S. BetP (PDB accession 4AIN): 
chain A, C,; chain B; C,S; chain C, C,S. Betaine (4AIN) and choline (4DO)) are 
shown in blue, red and black; sodium is shown in purple. S1 is the central 
betaine-binding site. The 2F, — F. map of Trp 377 in the C, and C,,, states is 
presented at the 1.4¢ level. 
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coordinate betaine®, whereas in C,S, betaine has shifted towards 
the cytoplasm so that it solely interacts with Trp 377 in TMH6’ and 
backbone carbonyls from TMH1’. 

Rotamer states of Trp 374 and Trp 377 change during state transi- 
tions, whereas Trp 373 remains in the same plane although with dif- 
ferent inclinations of the indole group. Both Trp 374 and Trp 377 
rotate by nearly 90°. It is important to note that the binding site in 
the C;S and C;,,.S states does not form optimal interactions with the 
substrate; that is, betaine ‘fits’ perfectly in the S1 site only in the tran- 
sient C.S state. Consequently, the binding energy is exploited fully only 
in that transition state. Trp 377 has a dominant role and any exchange 
against another residue renders BetP inactive°, whereas substitution of 
Trp 373 or Trp374 decreases affinity (Supplementary Fig. 5a, b). 
Side-chain rotations were also observed for several aromatic residues 
in TMH1’ and TMH6’ (Phe 156, Phe 369, Phe 380 and Phe 384), 
which are highly conserved in the BCCT family (Supplementary Fig. 5c). 
Alanine substitution of any periplasmic occluding residue (Supplemen- 
tary Fig. 5a, b and Supplementary Table 2) decreases affinity for betaine, 
whereas mutation of aromatic residues in the cytoplasmic pathway 
(Supplementary Fig. 5d, e and Supplementary Table 2) did not have a 
major effect on affinity, suggesting that only the periplasmic aromatic 
residues directly or indirectly contribute to betaine recruitment. 
Consequently, betaine is preferably transported from outside to inside. 
The transition from inward to outward also occurs via a closed, albeit 
substrate-free C, state. The closed states C, and C.S are very similar to 
one another (root mean squared deviation 1.0 A) and the tryptophan 
prism remains nearly unchanged (Supplementary Fig. 6). As this con- 
formational change occurs in the absence of substrate, a relatively flat 
free energy landscape might be required to allow BetP to cycle back 
to an outward-facing state, which is also presented here for the first 
time for a BCCT. Both outward-facing states are occluded from the 
cytoplasmic side by ~16A of protein bulk (Fig. 1). In the C, state, a 
periplasmic funnel extends to the Na2 site (Supplementary Fig. 7a). In 
the C,,, state, the periplasmic funnel is less deep and the Na2 site is not 
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accessible, mainly due to the rotation of Trp 377 by nearly 90° (Fig. 1, 
inset, and Supplementary Fig. 7a). The aromatic side chains described 
above form the floor in both outward-facing conformations, akin to the 
outward-open state of the arginine/agmatine antiporter AdiC®. C,,.and 
C, states show a very similar main-chain conformation, although a 
2.0 A displacement of the periplasmic half of TMH10’ facilitates open- 
ing of the external cavity (Supplementary Fig. 7b). The transition from 
the occluded to the open state might occur in BetP in the absence of 
substrate by stochastic thermodynamic fluctuations, after which 
sodium binding might trigger the full opening of the transporter to 
the periplasmic side to allow subsequent binding of betaine. 

On the basis of the three major conformations—outward-facing, 
closed and inward-facing—the alternating-access mechanism in 
BetP can be described as a hybrid of rigid body movements (Fig. 2) 
and individual flexing of symmetry-related helices (Figs 2 and 3). The 
rigid body movement involves the scaffold motif (TMH3’ and TMH4’ 
in repeat 1 and TMH8’ and TMH’ in repeat 2) tilting about 13° away 
from an axis running through the centre of BetP normal to the plane of 
the membrane, relative to the 4-TMH bundle. Similar rigid body rota- 
tion has been observed for LeuT? and Mhp1", for which the scaffolds 
tilt by ~10° and ~18°, respectively. However, compared to the tilting 
of the bundle domain observed for Mhp1’° and LeuT? this movement 
is less pronounced in BetP (Supplementary Fig. 8). Recently, we 
reported two regulatory ionic interaction networks in BetP that are 
involved in a crosstalk between individual protomers within the tri- 
mer''. These networks link bundle helices together, restricting their 
flexibility, a unique feature only observed for the osmoregulated BetP. 
We suggest that the restricted opening of BetP is related to its exclusive 
specificity for betaine, whereas other betaine BCCTs of the OpuD type 
show low affinity for other osmolytes, for example, proline’. Betaine is 
a rather small molecule that acts as an osmolyte and protein stabilizer, 
promoting folding by forcing proteins to decrease its exposure to 
solvent by markedly decreasing the water activity’’. Betaine is excluded 
from the first hydration shell of the protein backbone; however, in 


Figure 2 | Conformational changes during the alternating-access cycle. 

a, Periplasmic and cytoplasmic views of conformational changes during the 
transition from the C, (bundle domain in red (TMHs 1’, 2’, 6’ and 7’), hash 
motif in blue (TMHs 3’, 4’, 8’ and 9’) and thin gates in green (TMHs 5’ and 


10’); see Supplementary Fig. 1) to the C, state (light blue), and from the C, to the 
C; state (light yellow). b, Superpositions of the C, (coloured as in Supplementary 
Fig. 1) and C; state (light blue) showing in cylinders the scaffold domain, the 
bundle domain, the ‘thin’ gates and the extra helical segments, respectively. 
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Figure 3 | Opening and closing of the periplasmic and cytoplasmic gates in 
BetP. a, Sequence of opening and closing of gating main chains on both sides of 
the membrane. From left to right, C., C. and C; states. b, Model of alternating- 
access ‘gating’ mechanism in BetP. From left to right, schematic drawings of the 
C., C, and C; states. Betaine is shown in black, blue and red, sodium ions in 


transporters interactions with substrates in the micromolar affinity 
range have to take place’. From this point of view a fully hydrated 
substrate pathway owing to a wide opening might be disadvantageous 
for the stability of BetP and efficient betaine import. 

To open the pathways adequately, BetP additionally requires a 
gating mechanism, observed by the movement of individual helical 
parts that are not restricted by the regulatory networks (Figs 2 and 3). 
Similar to what was described recently for LeuT’, TMH 1a’ is displaced 
considerably to open the cytoplasmic gate. Independently, the 
symmetry-related TMH6a’ contributes to open the periplasmic gate, 
although with a less pronounced movement. During transition from 
the outward to inward state, TMH1a’ is displaced by ~5 A and tilted 
by 18°. In Leu! TMH1a is displaced by ~12 A and tilted by ~45°, 
which makes this movement the most notable conformational change 
during the inward-to-outward cycle’, positioning TMH1a into the 
hydrophobic bilayer. The movement in TMH1a’ is much smaller in 
BetP and not even existent in Mhp1. Although flexibility of TMH1la 
was suggested by single-molecule fluorescence resonance energy 
transfer (SmFRET) studies", it is unclear whether the marked confor- 
mational change of TMH 1a in LeuT is a consequence of the mutations 
introduced in the crystallization variant causing the weakening of the 
Na2 binding site and the perturbation of the intracellular gate, and to 
what extent this movement would take place in the native membrane 
environment. In the same context, we cannot entirely rule out that ina 
more native environment BetP might adopt a conformation which 
exhibits more pronounced flexing of TMH1a’. 

The discrepancy between the hinge-like bending movements of 
TMH1 and TMH6 in LeuT and BetP might also originate from the 
fact that in LeuT both helices possess a very flexible stretch, whereas in 
BetP this is true only for TMH1’. The midsection of TMH6’, being 
composed largely of hydrophobic residues (Ser-Pro-Phe-Val), is more 
rigid, as reflected by the smaller bending angle. On the other hand, 
hinge-like bending motions of TMH1’ were not observed for Mhp1"° 
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purple. Arrows indicate the closing of the periplasmic gates TMH6a’ and 
TMH10’ after substrate and ions bind to the C, state, and the opening of the 
cytoplasmic gates TMH1a’ and TMHS’ to allow the release of substrate and 
ions from the C, state. TMHs are coloured as in Fig. 2 and Supplementary Fig. 1. 


(Supplementary Fig. 8), as the corresponding region (Ala-Ile-Gln-Val- 
Ala) is quite hydrophobic and presumably rather rigid. 

Additional gating-like movements in BetP are observed for TMH5’ 
and TMH10’ (Figs 2 and 3), which are assigned as thin gates in the 
alternating-access mechanism of Mhp1"°. In BetP, TMH5’ constrains 
the cytoplasmic pathway in the outward-facing open state. During 
substrate translocation it moves out of the central pathway, undergoing 
a 9° tilt. The structurally equivalent extracellular part of TMH10’ 
constrains the periplasmic pathway in the inward-open state, which 
is again released when BetP returns to the outward-open state by a 
movement of 5.7 A and a massive tilt of 41° relative to the membrane 
normal. In contrast to the conformational change of TMH5', TMH10’ 
is able to move further away, as it is stabilized by the presence of 
TMH(-—1) and TMH(-2). Interestingly, the presence of a closed 
transition state implies that these periplasmic and cytoplasmic gates 
undergo uncoupled hinge movements in BetP. 

A new property revealed is the spring-like movement of the 
unfolded region of TMH1’. This stretch shows remarkable plasticity 
in BetP, which is related to the transient coordination of sodium ions 
during the alternating-access cycle (Fig. 4). In fact, the unfolded stretch 
in TMH1’ provides backbone carbonyls for the coordination of 
sodium in the Na2 site (Fig. 4). The plasticity is therefore affected by 
binding and dissipation of the positive charge of the coupling sodium 
in the Na2 site, inducing considerable backbone dihedral changes, as 
proposed for LeuT®"*, ApcT’* and vSGLT"* transporters. In general, 
partly unfolded transmembrane helices provide the structural flexibility 
to couple electrochemically favoured binding and release of ions to 
alternating opening and closing of ‘gates’!”. Indeed, all the structural 
elements involved in the transformation of BetP from the outward- to 
inward-facing state are either directly involved in the formation of the 
sodium-binding sites, such as in the case of the bundle helices TMH1’ 
and TMH6’ and the hash-motif helices TMH3’ and TMH8’ (Fig. 4), or 
indirectly involved such as the thin-gate helices TMH5’ and TMH10’ 
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Figure 4 | Conformational changes of the sodium-binding sites. 

a, Comparison of the Na2 and Nal sodium-binding sites in the C., C. and C; 
states of BetP. Sodium ions, in white, are presented to schematize the location of 
the Na2 and Nal sites. F, — F. map, in green, is presented at the 3.0e level for 


(Fig. 3). This correlates well with the absolute requirement of the pres- 
ence of sodium to allow betaine symport. 

In the outward-open and closed states, residues forming the Na2 site 
provide reasonable coordination for sodium. By contrast, in the inward- 
open state the coordinating residues are too distant to coordinate a 
sodium ion, which is a consequence of the main-chain conformational 
changes of TMH 1a’ (Fig. 4). We suggest that the Na2 site is perturbed 
in the inward-open state, which is in agreement with reports for LeuT? 
and by molecular dynamics (MD) simulations on several LeuT-like 
fold transporters’*'®. Recently, the position of the Nal site in BetP 
was identified (Khafizov et al., manuscript in preparation), formed by 
side chains of residues Phe 380 in TMH6’, Thr 246 and Thr 250 in 
TMH3’. A proper coordination sphere of sodium in the Nal site is only 
established in the closed state; in the outward- and inward-facing states 
residues in TMH3’ and TMH6’ are too distant to coordinate sodium 
ions (Fig. 4). 

Both ions bind at the interface between the scaffold and bundle 
helices (Supplementary Fig. 9), which move relative to one another 
during the alternating-access cycle. The cytoplasmic and periplasmic 
gates show a symmetry-related mechanism related to the binding and 
release of sodium ions at the two binding sites. In BetP, sodium bind- 
ing to Na2 relates to ‘opening-closing’ of the cytoplasmic gates, while 
binding to Nal relates to ‘opening-closing’ of the periplasmic gates 
(Figs 3 and 4). In the outward-facing conformation, TMHS5’, which is 


- A - A 


the Na2 site located in PDB accession 4AIN. A schematic of sodium-induced 
conformational change of TMH1a’ is also shown. Vectorial movement of 
sodium through the Na2 site induces a spring-like movement of the 
cytoplasmic part of TMH1’, which renders the cytoplasmic pathway accessible. 


the cytoplasmic ‘thin’ gate, interacts closely with TMH 1a’ via a dense 
network of hydrophilic and hydrophobic interactions. During transition 
to the inward-open state, release of sodium from the Na2 site triggers 
the movement of TMH 1a’, which detaches from TMHS9’, leading to a 
full opening of the cytoplasmic pathway. Also, in the outward-facing 
conformation TMH3’ and TMH6a’, which coordinate the sodium ion 
at Nal, are far away from one another, not allowing proper coordina- 
tion of sodium. In this conformation TMH10’, which is the periplasmic 
‘thin’ gate, adopts a conformation that renders the periplasmic pathway 
accessible, being pulled by interactions with TMH9’. We suggest that 
binding of sodium to the Nal site brings together TMH3’, TMH6’ and 
TMH9’, initiating a cascade of structural rearrangements that result in 
closure of the periplasmic pathway. 

Our data indicate that differences such as the location of sodium- 
binding sites, the unique coordination of the substrate by the Nal 
sodium ion in LeuT'’, or the absence of a Nal sodium ion as in the 
case of Mhpl1 are responsible for the relative contributions of rigid 
body rotation versus gating hinge movements observed in each trans- 
porter. Consequently, the conservation of the Na2 site is the unifying 
element in the LeuT-like fold transport mechanism. The alternating- 
access mechanisms of LeuT-fold transporters share common mech- 
anistic principles; however, for each individual transporter the primary 
structure and specific coordination of substrate and co-substrates will 
dictate to what extent ‘rocking bundle’ and ‘gating’ can occur. 
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LETTER 


METHODS SUMMARY 


Escherichia coli DH5« was used for the heterologous expression of strep-betp. 
Membranes were solubilized using B-dodecyl-maltoside (DDM), and BetP was 
purified by affinity chromatography via Strep-Tactin macroprep and size- 
exclusion chromatography*’’. Uptake of labelled betaine was measured in E. coli 
MKH13 cells, started by adding 5-250 1M of [‘*C] betaine upon an osmotic shock 
adjusted with KCI’. Binding assays were performed in proteoliposomes using 
tryptophan fluorescence emission between 315 and 370 nm with an excitation 
wavelength set to 295nm*'. Data from BetP(AN29/E44E45E46/AAA/G153D) 
crystals obtained in the presence of 1mM choline were collected to 3.2A at 
beamline id29 at the European Synchrotron Radiation Facility (ESRF). The 
structure was determined by molecular replacement using BetP (PDB accession 
3P03) as the search model (Supplementary Table 1). BetP(AN29/E44E45E46/ 
AAA) was crystallized in the presence of 5mM betaine and a data set to 3.1A 
was collected at the PXII beamline of the Swiss Light Source (SLS). The structure 
was determined by molecular replacement using BetP (PDB accession 4AMR) as 
the search model (Supplementary Table 1). For more details, see Supplementary 
Methods. 
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Beyond the farm 


Veterinary expertise is an advantage for researchers hoping 
to stem disease outbreaks and bolster food safety. 


BY AMY MAXMEN 


through chickens in Los Angeles, Califor- 
nia. More than 6,500 poultry died in the 
effort to control the outbreak. “It took virtu- 
ally the entire available veterinary staff at 
the USDA [US Department of Agriculture] 


IE 2002, Newcastle disease spread rapidly 


to isolate and eradicate the disease in three 
months,’ says Alan Kelly, dean emeritus of 
the School of Veterinary Medicine at the Uni- 
versity of Pennsylvania in Philadelphia. He 
fears that if highly infectious foot-and-mouth 
disease ever reached the United States — it 
hit the United Kingdom in 2007 and Japan 
in 2010 — the nation might not have enough 


veterinarians to contain it. The trend towards 
hot summers in some areas could exacerbate 
viral epidemics, providing favourable condi- 
tions for the spread of zoonotic diseases such 
as West Nile virus and hantavirus. 

With the notable exception of clinical prac- 
tice, shortages haunt pretty much all the sectors 
in which veterinarians typically look for jobs. 
Within government, academia and industry, 
alike, positions for veterinarians with master’s 
degrees or PhDs remain vacant, according to 
a report by the US National Research Council 
(NRC) published at the end of May and writ- 
ten bya committee chaired by Kelly. The com- 
mittee was formed after members of the US 
Congress expressed concerns about whether 
the nation had enough personnel to moni- 
tor zoonotic disease outbreaks, regulate food 
safety and evaluate medicines. “The Depart- 
ment of Labor predicts a lot of opportunities 
for veterinarians, and I think many of those 
opportunities will open up in the near future,” 
says Michael Gilsdorf, executive vice-president 
of the National Association of Federal Veteri- 
narians, based in Washington DC. 

Veterinarians in research-related posts say 
that many people are not aware of the versatil- 
ity of a veterinary career. “People have a very 
narrow view of what veterinarians do,” says 
Bonnie Buntain, a public-health expert at the 
University of Calgary in Canada who started 
out as a horse veterinarian. “Who would have 
thought that a horse vet from Hawaii would 
be guiding national regulations on food safety 
and humane animal treatment in Washington 
DC at the USDA, and then be offered a ten- 
ured professor position?” Those who like to 
apply analytical skills to real-world problems 
in research laboratories, the environment, the 
food supply and beyond will find ample oppor- 
tunities — but they may have to invest in some 
extra schooling. 


DETECTIVE WORK 

Research veterinarians have much to offer. 
Federal and state governments need them to 
identify animals with disease; enforce humane 
slaughter regulations; track wildlife; monitor 
Escherichia coli and Salmonella in food; detect 
the effects of toxic compounds in the ecosys- 
tem; inspect dog-breeding facilities and more. 
Most of the veterinarians involved in gov- 
ernment work are employed by the USDA. 
More-specialized positions are available at 
the National Institutes of Health (NIH), the 
Food and Drug Administration (FDA) and the 
Centers for Disease Control and Prevention 


4 OCTOBER 2012 | VOL 490 | NATURE | 131 


© 2012 Macmillan Publishers Limited. All rights reserved 


> (CDC). At the NIH in Bethesda, Maryland, 
veterinarians preside over animal facilities, or 
study cancer and other diseases in animals. At 
the FDA, based in Silver Spring, Maryland, 
they investigate food safety, and help to regu- 
late the genetically modified (GM) animals 
that are slowly making their way through the 
FDA’ regulatory pipeline. And at the CDC in 
Atlanta, Georgia, they monitor zoonotic dis- 
ease outbreaks. 

Internationally, industry positions for people 
with a doctorate in veterinary medicine (DVM) 
anda master’s or PhD in toxicology, pharma- 
cokinetics or another field are becoming ever 
more lucrative. Jobs at pharmaceutical and bio- 
technology companies pay base salaries rang- 
ing from US$85,000 to $150,000, depending on 
the level of seniority. Mary McConnel, a stra- 
tegic initiatives director at Pfizer, says that her 
combination of a DVM and a master’s in busi- 
ness seemed to be in demand. As soon as she 
sent her CV to Pfizer, “they were after me like a 
bat out of hell’, she says. McConnel enjoys her 
job as a consultant for veterinary businesses, 
and the generous salary that goes with it. 

In general, those with a DVM and a science 
degree have a variety of options in pharma- 
ceutical and biotechnology companies. They 
might conduct experiments on the efficacy 
and safety of drugs in animals before human 
trials begin, or they might help to develop 
treatments for animals. At animal-supply 
companies, they often develop and care for 
GM laboratory animals. At diagnostic labora- 
tory companies, they develop diagnostic tests 
for both pets and laboratory animals. 


SHORTFALL SOLUTIONS 

For all their usefulness, however, research 
veterinarians have been on the decline — 
especially in government. According to a 2009 
report by the US Government Accountability 
Office, the number of veterinarians working 
for the federal government had fallen by 40% 
since 1990. And one-third of veterinarians 
employed by the USDA, the FDA and the US 
Army were due to retire in 2011. 

Meanwhile, veterinary schools in the United 
States and the United Kingdom say that they 
struggle to find candidates that have both clini- 
cal and research experience. According to the 
NRC report, roughly 11% of the veterinarians 
with faculty positions will be retiring by 2016. 
Plus, student enrolment at veterinary schools 
is increasing, which is pushing up the demand 
for teachers. At the University of Glasgow, 
UK, Nicholas Jonsson says that the university 
is now having to hire veterinarians who do not 
have PhDs. “In the past that would have been 
unthinkable,’ he says, “but we desperately need 
faculty for teaching and clinical posts.” 

Although most of Jonsson’s students pursue 
careers in pet medicine, he encourages them 
to consider a career that goes beyond the day- 
to-day routine of clinical practice. 

Jonsson started his career as a farm-animal 


veterinarian in rural Australia. For him, it was 
a less-than-ideal job, he says. “You're on call 
every other weekend, you drive a lot, you're 
kicked and beaten and stomped.” He then 
returned to university for a PhD, and later 
moved to Glasgow, where he researches the 
evolution of drug resistance in parasites. 

US agencies have made some effort to entice 
new talent. In 2003, the NIH’s National Cancer 
Institute launched a programme for graduate- 
level biomedical education in partnership with 
DVM programmes at veterinary colleges across 
the country. The seven alumni who have com- 
pleted the NIH Comparative Biomedical Scien- 
tist Training programme have gone on to work 
as postdocs, tenure-track assistant professors, 
NIH staff scientists and industry pathologists. 
And in 2008, the CDC introduced a two-year 
research-focused residency programme with 
the aim of addressing a shortage of veterinar- 
ians to monitor disease outbreaks. 


DUAL-DEGREE ADVANTAGE 

The path to many research-veterinarian 
opportunities entails a dual degree, usually 
with a focus on both veterinary medicine and 
a basic science such as toxicology, genetics, 
epidemiology or parasitology. Of the 28 vet- 
erinary schools in the United States, 13 offer 
joint DVM-PhD programmes. 

The price of veterinary school alone — 
around $66,000 per year for a four-year degree 
in the United States, and under half that in the 
United Kingdom for UK citizens — presents an 
obstacle. After accumulating $140,000 in debt, 
Kelly says, young veterinarians tend to want 
to start earning money rather than enrol in 
graduate programmes that offer, at best, mod- 
est stipends. “It’s a desperate situation that has 
to change,’ Kelly says. “Salaries need to increase 
and the cost of veterinary education has to 


Nicholas Jonsson encourages students to go 
beyond the usual career choice of pet medicine. 
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decrease.” The NRC report recommends that 
the government, economists, industry, deans 
and veterinary organizations create strategies 
to reduce the costs of veterinary schools — such 
as sharing facilities and starting online courses. 

Grants from the NIH, called T32 institu- 
tional training awards, provide salaries for pre- 
and postdoctoral DVMs pursuing biomedical 
research. Merck, Pfizer and other large drug 
companies offer fellowships to support DVMs 
who want to return to a research career. 

Despite the high price tag of veterinary 
school, Claude Nagamine says that pursuing 
the DVM was his best career move — he rel- 
ishes not only the frequent interaction with 
animals but also the job security. He decided 
to pursue a DVM after failing to earn tenure 
as an assistant professor in the cell-biology 
department at Vanderbilt University in Nash- 
ville, Tennessee. 

As he pondered his next move, he says, he 
realized he loved working with animals. He 
had spoken with veterinarians working in the 
animal facility and was intrigued. 

Immediately after veterinary school, he began 
a research residency at an animal-care facility 
at the Massachusetts Institute of Technology in 
Cambridge, and ended up at Stanford Univer- 
sity in California, where he not only conducts 
his own research, but also helps scientist col- 
leagues to navigate the regulations and paper- 
work involved in animal experimentation. “If 
youre driven to do publishable research, there 
are lots of jobs out there,’ he says. 

Research veterinarian Anne Fairbrother 
says that her work combines research with 
investigative pursuits. Fairbrother, who holds 
a DVM anda PhD in wildlife disease, worked 
at the US Environmental Protection Agency 
for 13 years, monitoring the risks posed by GM 
plants on wildlife. This often involves checking 
the concentration of chemicals in soil, plants 
and water, and then comparing them with con- 
centrations in animal serum and tissue. 

Or, if animals start to die in abnormal num- 
bers, she might be asked to search for clues 
related to parasite or viral infections, com- 
pounds such as petroleum that might have 
leaked from gas stations and signs of industrial 
pollutants. “If you go to a place where there 
is contamination, you know what questions 
to ask to find out the history of the area and 
which signs to look for, because sometimes 
chemicals are causing problems but it could 
be something else,” she says. “That diagnostic 
approach is something that you learn in vet- 
erinary school.” 

But a love for animals is probably the main 
requirement for researchers looking to multi- 
ply their career options with a DVM degree. At 
a time when degrees and certifications rarely 
confer job guarantees, the veterinary research 
option stands apart. m 


Amy Maxmen is a locum biology 
correspondent at Nature. 


TURNING POINT 


Ethan Perlstein 


Ethan Perlstein has spent five years 

creating a sub-field of research that he calls 
evolutionary pharmacology. As his fellowship 
at Princeton University’s Lewis-Sigler 
Institute for Integrative Genomics in New 
Jersey comes to an end and he searches for his 
next academic post, Perlstein is maintaining 
an innovatively designed website adorned 
with modules and discussion threads to help 
to communicate his thoughts on science. 


When did you first start fostering 
communication among scientists? 

I was an intern at a small biotech company 
before my final year of high school. As part 
of that, I would read immunology articles, 
formulate questions and start a correspond- 
ence with the author. One of these authors was 
Ronald Germain, an immunologist at the US 
National Institutes of Health. He must have 
been struck by the idea ofa kid reading papers; 
he offered me another internship, in his lab in 
Bethesda, Maryland, where I worked for the 
summer before going to Columbia University 
in New York to study sociology. 


How did you come to champion evolutionary 

pharmacology? 

After several rotations in different labs as 
a cell-biology graduate student at Harvard 
University in Cambridge, Massachusetts, 
I realized that I wanted to work with small 
molecules relevant to human diseases. I also 
wanted to use a simple model system, such 
as yeast, so that I could do a lot of experi- 
ments quickly. I noticed that several small 
molecules that affect yeast growth are also 
psychiatric drugs, and I started studying the 
connection. There is a large evolutionary dis- 
tance between yeast and humans, but these 
drugs affect ancient processes that we share. 


How is your job search going? 

Like many of my colleagues, I have been bat- 
tered by the job market. I received zero inter- 
views out of 18 applications this year, despite 
having a five-year independent position with 
a US$1-million budget on my CV. The fellow- 
ship has been great, but it is not a normal 
postdoc, so I am not sure that the wider com- 
munity of scientists knows what to make of 
it. lama research cross-pollinator, and don't 
have one well-known area of expertise. I won- 
der if it may be harder for me to break out. 


Has your decision to invest time, money and 
energy in your website paid off? 
I think so. Most academics use a lab website 


to list publications; essentially it becomes 
static, a version of their CV. I wanted to do 
something new and cool that would help to 
communicate science. I am not a program- 
mer, so I spent several thousand dollars of 
my own money on hiring a professional 
design team to create something interactive. 
It includes my tweets, blog posts and research 
summaries — replete with pop-culture refer- 
ences — ina series of modules that encourage 
viewers to add their own comments. It seems 
to work: one private-sector researcher who 
checked out my website contacted me about 
mutually beneficial research opportunities. 


One post on your website breaks down your 
academic lab budget. Why share this? 

My fellowship finishes at the end of the year, 
and I am interested in crowd-funding a pro- 
ject on how amphetamines such as crystal 
meth work. Iam asking for roughly $25,000, 
and I thought that I should give potential 
funders evidence that I am responsible with 
money. I see this as part of the same move- 
ment as a group of scientists who are posting 
their grant proposals — whether they are 
successful or not. Iam excited about experi- 
menting with the way we do science. 


What has been your career turning point? 

Without a doubt, joining Twitter in 2011, 
when I started offering my thoughts about 
changing the way science is done. I found 
a community of people passionate about 
rethinking scholarly publishing and funding. 
I had hoped for a way to scale up the e-mail 
cold-calling that I had done at high school. 
Twitter was a way to connect with like-minded 
people and keep a conversation going 24/7. m 
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FUNDING 
People power 


Researchers are starting to turn to crowd- 
funding as a way to support their work, says 
Simon Vincent, head of personal awards 
and training at Cancer Research UK in 
London. At the 20 September Naturejobs 
Career Expo in London, Vincent said that 
ina lean funding environment, seeking 
donors can be easier than navigating the 
grant-application process. “There’s no peer 
review or middlemen; he told conference 
attendees. “If you have a good idea and 

can convince enough people, you get the 
money.’ But Vincent warned that crowd- 
funding — seeking funds through the 
online community — also has downsides. 
Traditional research-grant peer review 
provides quality control, a reality check 
and a way to hone and refine an idea, 

and the interaction with the funder can 
provide links to large, established networks 
in the scientific community. Crowd- 
funding, even with established sites such 
as petridish.org, requires a lot of time and 
public interaction, Vincent said. Scientists 
often have to make a video about their 
research project and must stay in regular 
contact with donors, who can number in 
the hundreds. 


ETHICS 
Relationship advice 


A university conflict-of-interest 
committee should review contracts 
between academic scientists and industry 
sponsors that are worth US$5,000 or 
more, concludes a draft report entitled 
Recommended Principles & Practices to 
Guide Academy-Industry Relationships. 
Researchers should never ghostwrite 
research papers and should retain 
oversight of intellectual property and 

a stake in the proceeds from patents, 
according to the proposal. The report 
offers 56 guidelines for maintaining 
academic freedom and upholding 
ethical conduct in partnerships and 
collaborations between academics 

and industrial sponsors. Issued on 

18 September, it was written by the 
American Association of University 
Professors in Washington DC in response 
to the increasing number and complexity 
of such partnerships, says co-author 
Cary Nelson, a past president of the 
association. “The corrupting power of 
money has become much more clear,” he 
says, noting that issues such as sponsors 
suppressing data from studies and 
persuading eminent researchers to add 
their names to papers they did not write 
seem to be on the rise. 
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MAN'S BEST FRIEND 


BY GRACE TANG 


cc ( ven on girl, you can do it.” I 
gently coaxed Dr Gleitman’s 
latest subject as she shook off the 

last of the drugs and struggled to lift herself 

from the bed. At three years of age, Callie 
was Dr Gleitman’s youngest subject to date. 

Her dark brown eyes glinted as she gradually 

blinked them open, adjusting to the harsh 

fluorescent lights of the post-op recovery 
room. 

The matte black chassis of the neural 
implant peeked out from a small bald patch 
between the fine gold strands on her head. 
Even though I was not the one who put it on 
her, as Dr Gleitman’s research assistant, I 
felt a twinge of guilt that I was subjecting 
her to this painful procedure — with- 
out her consent, no less, as she was 
unable to give it before the operation. 
But then again, God did not obtain 
consent when he created Man from 
the clay. 

And now Man was imparting God’s 
greatest gifts to his best friend. Id spent 
a good portion of my working life in 
this lab, and even though I'd seen count- 
less animals pass through these halls, Id 
grown fond of Callie since wed got her 
from the local pound, a day before she was 
scheduled to be put down. She was up on her 
feet now. She sniffed cautiously at me. 

I wondered what vocation she would be 
assigned to as I leaned in close to her and 
let her lick my face. Because our funding 
came from the military, primates were usu- 
ally used in jungle warfare. Cats, with their 
excellent night vision and stealth, were used 
in reconnaissance. Dogs usually went to the 
army or police, for more traditional roles as 
sniffers or for search-and-rescue. With her 
gentle demeanour, Callie probably would 
not be an attack dog. I heard that augmented 
animals were being used for therapeutic pro- 
cedures now. Maybe she would be trained to 
bea seeing-eye dog, or used in the treatment 
of post-traumatic stress disorder? 

“Your name is Callie. Can you say Callie?” 

Of course I was not expecting her to speak, 
in the strictest sense of the word. Her lips 
were not shaped for speech, and unlike the 
apes, her limbs were not shaped for signing. 
Her ears perked up, and 


she tilted herheadinthe NATURE.COM 

most endearing way. Follow Futures on 
“Cal... lie...?” Facebook at: 
The single red LED __go.nature.com/mtoodm 


Animal instincts. 


on the speech synthesizer blinked, indicating 
that the implant had successfully extracted, 
from the spatial and temporal firing patterns 
across hundreds of thousands of Callie’s 
neurons, her thoughts, emotions and inten- 
tions, and further transduced those signals 
to spoken words, complete with affective 
tone, closely mimicking human speech. 
Dr Gleitman had added yet another success 
to his list. 


I darted over to the adjoining office 
where Dr Gleitman was asleep in his reclin- 
ing chair, feeling Callie’s inquisitive eyes on 
meas! left the room. Normally, I would not 
disturb him, but I thought he might want 
to know that his latest subject was awake 
and talking. I nudged his hand. He jumped 
slightly, disoriented for a second. 

“What is it, Moe?” 

“Callie’s unit is already functional, Sir! 
She managed to say her name when I asked 
her to” 

The poorly masked irritation at having his 
nap interrupted melted away into a satisfied 
smile. 

“So quickly? That's incredible. She must 
be the most talented canine we've had yet. 
Usually it takes at least a full day for speech 
comprehension to begin, let alone produc- 
tion of the first word?” 

He rose from his chair and walked 
to where Callie was busy pawing at her 
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Elizabethan collar. “I wonder if it’s because 
her youth makes her brain more malleable 
and lends itself better to the implant. Or per- 
haps it’s because she’s female. Moe, can you 
make a note of this?” 

I went to the corner of the lab where the 
video recorder was and made notes for the 
day, watching Dr Gleitman interact with 
Callie as I narrated my observations. 

“Hi Callie, do you understand me? Do 
you know where you are?” Dr Gleitman was 
already trying to extract complete sentences 
from her. That usually took weeks. Clearly 
he expected more from her compared with 

his previous subjects. 

When I was done, I eagerly 
returned to Callie’s bedside. I worked 

up the courage to ask Dr Gleit- 

man the question that had been 

burning inside me ever since we 
brought her home. 

“Do you think you will keep 

her?” 

He looked shocked — per- 
haps I had spoken out of place. 
Surely he would say no, and Callie 
would be deployed to some faraway 

base, where I'd never see her again. 
But his face softened. 

“T’ve actually thought about it. 
Because of her exceptional perfor- 
mance, I could easily argue for her to 
be kept here for observation and test- 
ing. Not to mention, I’m developing a 
soft spot for you Golden Retrievers.” 

I could barely keep still in my 
excitement, until I remembered what Dr 
Gleitman had said about behaving more 
like a human and not like a stray pup if I 
wanted to keep my job in the lab. But it was 
hard keeping my tail still when the news was 
making Callie’s tail wag so fast that she was 
sending strands of fur flying. Dr Gleitman 
was grinning widely at the sight. In time he 
would teach her to use language to convey 
her thoughts instead of these primal displays 
of emotion, as he had done with me. 

“Well I'm pleased both of you are happy 
with this arrangement. You'll have to do the 
paperwork though, Moe” 

“Gladly!” I would get started on that later, 
but for now I thought of what to say to wel- 
come the latest addition to our home. m 
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